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Popular Summary 

It has often been said that the reason we cannot predict tomorrow’s 
weather is that we don’t know today’s weather. More precisely, if the com- 
plete state of the earth’s atmosphere (e.g., pressure, temperature, winds and 
humidity, everywhere throughout the atmosphere) were known at any par- 
ticular initial time, then solving the equations that govern the dynamical 
behavior of the atmosphere would give the complete state at all subsequent 
times. Part of the difficulty of weather prediction is that the governing equa- 
tions can only be solved approximately, which is what weather prediction 
models do. But weather forecasts would still be far from perfect even if 
the equations could be solved exactly, because the atmospheric state is not 
and cannot be known completely at any initial forecast time. Rather, the 
initial state for a weather forecast can only be estimated from incomplete 
observations taken near the initial time, through a process known as data 
assimilation. 

Weather prediction models carry out their computations on a grid of 
points covering the earth’s atmosphere. The formulation of these models is 
guided by a mathematical convergence theory which guarantees that, given 
the exact initial state, the model solution approaches the exact solution of 
the governing equations as the computational grid is made more fine. For 
the data assimilation process, however, there does not yet exist a convergence 
theory. For instance, it is not yet known how to formulate a data assimilation 
method in such a way that increasing the number of observations available 
to estimate the initial state is guaranteed to improve the accuracy of the 
estimated state, even in a statistical sense. Instead, the development of 
data assimilation methods has proceeded on the basis of a number of ad hoc 
assumptions and approximations. 

This book chapter represents an effort to begin establishing a conver- 
gence theory for data assimilation methods. The main result, which is called 
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the principle of energetic consistency, provides a necessary condition that 
a convergent method must satisfy. Current methods violate this principle, 
as shown in earlier work of the author, and therefore are not convergent. 
The principle is illustrated by showing how to apply it as a simple test of 
convergence for proposed methods. 
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1 Introduction 


The statement of conservation of total energy for nonlinear stochastic dynamical 
systems, when expressed in the natural energy variables of the system, provides 
an exact dynamical link between just the first two moments of the state of 
the system. This statement is what will be called here the principle of energetic 
consistency. This principle should be useful to the data assimilation community, 
because most current four-dimensional data assimilation methods are in fact 
based on an approximate evolution of the first two moments, conditioned on 
the observations. In particular, the principle provides one simple test of how 
well current methods approximate the actual evolution. 

Suppose that the system state s = s (t) is a vector governed by a nonlinear 
conservative system of ordinary differential equations 

ds 

s -+f(s,() = °, 

where t is time, and suppose that the state variables have been chosen in such 
a way that the conserved quantity is E = s T s, where the superscript T denotes 
transposition. Thus the statement of energy conservation is Eft) = E(to). Now 
suppose that the initial state s(to) is a vector-valued random variable, with mean 
s(t 0 ) and covariance matrix P(fo)- Then the principle of energetic consistency 
says that, under certain hypotheses, 

s T (t)s(t) + trP(f) = s T (t Q )s(t 0 ) + trP(to), 

where tr A is the trace, or sum of the diagonal elements, of a matrix A. The 
trace of the covariance matrix P(f) is sometimes called the total variance of 
the system state. Thus the principle of energetic consistency says that any 
increase (decrease) in the uncertainty in the state of the system, as measured 
by the total variance, is compensated for exactly by a corresponding decrease 
(increase) in the energy of the mean state. Some ramifications of the principle of 
energetic consistency for ordinary differential equations, in the context of both 
data assimilation schemes and predictability theory, were given in a recent paper 
of Cohn (2008). The principle holds also for nonlinear, conservative discrete- 
time systems. 

The purpose of this chapter is to establish the principle of energetic consis- 
tency for a class of hyperbolic partial differential equations, and in particular, 
to determine precise conditions under which it holds. Ultimately a rigorous con- 
vergence theory for data assimilation methods will be needed. Given the role 
that energy considerations play in the convergence theory for discretizations of 
partial differential equations, the principle of energetic consistency is likely to 
play a role in a convergence theory for data assimilation methods. 

The principle of energetic consistency for infinite-dimensional spaces is given 
as Theorem 1 in Section 2.3. It states that under appropriate hypotheses, 

INI 2 +tr Vt = |NI | 2 +ti 'Vt 0 , 
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where the norm is a Hilbert space norm and Vt is the covariance operator of 
the system state. Hypotheses needed for the principle of energetic consistency 
to hold, for ordinary differential equations and for classical solutions of sym- 
metric hyperbolic partial differential equations, are examined in Section 3. A 
main result is that for the latter case, the system state cannot be Gaussian- 
distributed, and the class of probability distributions that the system state can 
have is identified. The global nonlinear shallow-water equations are treated as 
an example in Section 4. There hypotheses are given so that the principle of 
energetic consistency holds when the state variables are taken to be the natural 
energy variables. It is shown that tr ? -’t takes the simple form 


tr V t = 


tr P t (x, x) a cos <fi d(f> d\ , 


where P ( (x, y) is the covariance matrix of the shallow- water system state. 

The rigorous theory established in this chapter requires some mathematical 
machinery. The natural framework for the principle of energetic consistency 
is the theory of Hilbert space-valued random variables, which is covered in 
Appendix A. Appendix B covers the theory of families of Hilbert spaces which 
is needed to handle spherical geometry conveniently. Appendix C summarizes 
mathematical basics needed throughout the text. 


2 The principle of energetic consistency 

2.1 Problem setting 

Let TL be a real, separable Hilbert space, with inner product and corresponding 
norm denoted by (•, •) and || • ||, respectively. Recall that every separable Hilbert 
space has a countable orthonormal basis, and that every orthonormal basis of a 
separable Hilbert space has the same number of elements N < oo, the dimension 
of the space. Let {h;}^ be an orthonormal basis for TL , where N = dim TL < oo 
is the dimension of TL. 

Let S be any nonempty set in B(TL ), where B(TL) denotes the Borel held 
generated by the open sets in TL, i.e., B(TL) is the smallest a - algebra of subsets 
of TL containing all the sets that are open in TL. In particular, S C TL, S can be 
all of TL, and S can be any open or closed set in TL. 

Let to and T be two times with —oo<to<T <oo, and let T be a time 
set bounded by and including to and T. For instance, T = [ to,T ] in the case 
of continuous-time dynamics, and T = [to, ^i, ■ • • > Lk = T] in the discrete-time 
case. The set T is allowed to depend on the set S,T = T(6>). 

Let N tjio be a map from S into TL (written N tj<0 : S — > TL) for all times 
t € T, i.e., for all s to e S and t G T, Nq io (s to ) is defined and 


s * — N Mo (s to ) (1) 

is in TL, ||si|| < oo. Assume that N t ,t. 0 is continuous and bounded for all t G T. 
Continuity means that for every t, G T, s to € S and e > 0, there is a S > 0 
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such that if ||s to - s'J < 6 and s' o G S , then ||N Mo (s to ) - N Mo (s( o )|| < e. 
Boundedness means that there is a constant M = Af t to such that 

ll N Mo( s to)ll < -^Moll s t 0 || 

for all s to G S and t G T. Continuity and boundedness are equivalent if N i>to is 
a linear operator. 

In most applications, N tjto will be a nonlinear operator. Typically it will 
be the solution operator of a well-posed initial-value problem, for the state 
vector s of a nonlinear, deterministic system of partial (dim7f = oo) or ordinary 
(dim7f < oo) differential equations (T = [i 0 , X 1 ]). 1 Recall that continuity of the 
solution operator is part of the (Hadamard) definition of well-posedness of the 
initial-value problem for continuous-time or discrete-time dynamical systems: 
not only must there exist sets S and T = T(S), taken here to be defined 
as above, and a unique solution s t G H for all s to G S and t G T, which 
taken together define the solution operator, but the solution must also depend 
continuously on the initial data. 

The operator N t to is called isometric or conservative (in the norm || • || on 

H) if 

||N t , to (s to )|| = ||s to || 

for all s to G S and t G T, and the differential (or difference) equations that ex- 
press the dynamics of a well-posed initial-value problem are called conservative 
if the solution operator of the problem is conservative. With s t G H defined for 
all s t , 0 G S and t G T by Eq. (1), the quantity 

E t = ||s t || 2 = (s t ,st) < oo (2) 

satisfies E t < Mf to E to for all t G T under the assumption of boundedness, and 
is constant in time, E t = E to for all t G T, in the conservative case. 

In essence, the principle of energetic consistency is a statement about con- 
tinuous transformations of Hilbert space which are conservative. Applied to 
solution operators, it becomes a statement about well-posed initial- value prob- 
lems for conservative dynamics. It is important to recognize that the quantity 
Et defined in Eq. (2) is quadratic in s t . For nonlinear systems of differential 
equations that express physical laws, there is usually a choice of dependent 
(state) variables such that E t is the physical total energy. Then the dynamics 
are conservative in the norm on 7~L if the physical system is closed, and the 
principle of energetic consistency applies. 

2.2 Scalar and Hilbert space- valued random variables 

Before stating the principle of energetic consistency, some probability concepts 
will first be summarized. For details, see Appendices A.l A. 3 and C.3. 

1 As discussed further in Section 3, for partial differential equations H will usually be the 
space L 2 (D) of square-integrable vectors on the spatial domain D of the problem and S will 
be an appropriate Sobolev or Sobolev-like space, while for ordinary differential equations 'H 
will usually be Euclidean space IF' ;V and S will be an appropriate open set in R . 
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Let (fi, P, P) be a complete probability space, with the sample space, 
T the event space and P the probability measure. The event space consists of 
subsets of the set fl, called events or measurable sets, which are those subsets on 
which the probability measure is defined. Denote by £ the expectation operator. 

A (scalar) random variable is a map r : £l — » K 6 that is measurable, i.e. , an 
extended real-valued function r, defined for all to £ Cl, that satisfies 

{w £ D : r(u>) < x} £ T 

for all x £ R. Thus, if r is a random variable then its probability distribution 
function 

F r (x) = P({w £ fl : r(u>) < x}) 

is defined for all x £ R. If r is a random variable then r 2 is a random variable. 

Suppose that r is a random variable. Then the expectation £\r\ is defined 
and £\r\ < oo. If £\r\ < oo, then the expectation £r is defined and called the 
mean of r, and \£r\ < £\r\ < oo. If £r 2 < oo, then r is called second-order, the 
mean r = £r and variance a 2 = £(r — r) 2 of r are defined, and 

£r 2 = f 2 + a 2 . (3) 

An H - valued random variable is a map r : — > H such that 

{uj £ £1 : r(w) £ B} £ T 

for every set B £ B(H). A map r : £1 — > TL is an 7Pvalued random variable if, 
and only if, (h, r) is a scalar random variable for every h £ H, that is, if and 
only if 

{uj £ fl : (h, r(w)) < x} £ T 

for all h £ H and x £ R. If r is an 7Y-valued random variable then ||r|| is a scalar 
random variable. An 7i-valued random variable r is called second-order if ||r|| is 
a second-order scalar random variable, i.e., if £ ||r || 2 < oo. If r is a second-order 
7d-valued random variable then (h, r) is a second-order scalar random variable, 
i.e., £(h, r ) 2 < oo, for all h £ TL. 

Suppose that r is a second-order Tf-valued random variable. Then there 
exists a unique element r £ 7d, called the mean of r, such that £(h, r) = (h, r) 
for all h £ TL. Also, r' = r — r is a second-order Bi - valued random variable with 
mean 0 £ 7 i, and 

£||r || 2 = ||r|| 2 +£||r'|| 2 . 

Furthermore, there exists a unique bounded linear operator V : BL — > 7Y, called 
the covariance operator of r, such that 

^(g-r'Xhy) = (g,Ph) 

for all g, h £ Bi. The covariance operator V is self-adjoint and positive semidef- 
inite, i.e., (g,Ph) = (Pg, h) and (h,Ph) > 0 for all g, h £ Bi. It is also trace 
class, i.e., the sum (hi, Phi) is finite and independent of the orthonormal 
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basis (M£i> N = dinr7d < oo, chosen for H. This sum is called the trace of 

V: 

N 

trV = y^(hi,Phj) < oo. 

i= 1 

In addition, there exists an orthonormal basis for TL which consists of eigenvec- 
tors {hJ-1-L of V, 

Vh, = Ajh; 

for i = 1, 2, . . . , TV, and the corresponding eigenvalues {Aj}^ 1 are all nonnega- 
tive. It follows that 


A* = (hjjPhj) =5(hj,r / ) 2 = of , 

where a‘~ is the variance of the second-order scalar random variable (h,, r), for 
i = 1,2, , N , and that 


N 

trP = 5>?=£||r'|| 2 . 

i= 1 

Thus the trace of V is also called the total variance of the second-order TL - valued 
random variable r, and 

N 

£||r|| 2 = ||f|| 2 + f||r'|| 2 = ||r|| 2 + $> 2 = \\n 2 +ttV. (4) 

i - 1 

Equation (4) generalizes Eq. (3), which holds for second-order scalar random 
variables, to the case of second-order H - valued random variables. 

Suppose that 1Z G B{H). An 1Z - valued random variable is a map r : O — > 1Z 
such that 

{ued: r(w) G C} G T 
for every set C G where 

B n (H) = {B G B(H) : B clZ}. 

Every 7?.-valued random variable is an 7Y-valued random variable, and every 
n ~ valued random variable r with r(u>) G 7Z for all to G is an 7?.-valued random 
variable. An 7Z - valued random variable r is called second-order if ||r|| is a 
second-order scalar random variable. Thus every second-order 7Z - valued random 
variable is a second-order 7Y-valued random variable, and every second-order 
7Y-valued random variable r with r(w) G 7 Z for all w G !1 is a second-order 
7Z - valued random variable. Finally, if r is an 7Z - valued random variable and N 
is a continuous map from 7 Z into 7i, then N(r) is an 7Y-valued random variable. 
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2.3 The principle of energetic consistency in Hilbert space 

Referring now back to Section 2.1, consider for s to not just a single element of 
5 , but rather a whole collection of elements s to (w) indexed by the probability 
variable u> £ fl. Suppose at first that s to is simply a map s (o : Q — > 5, i.e. , 
that s to (a;) is defined for all u> € fl and s to (w) G 5 for all u G H. Then since 
N t>to : 5 — > LL for all t G T, it follows that s t = N t)to (s to ) : tt — » H for all t G T, 
with 

StH = N t , to (s to (w)) 

and ||s t (w)|| < oo, for all u> G £1 and t G T. 

Suppose further that s to is an 5-valued random variable. Then it follows 
from the continuity assumption on N tjto that s t is an Li- valued random variable, 
and therefore that E t = ||s t || 2 is a scalar random variable, for all t G T . 

Suppose still further that s io is a second-order 5-valued random variable, 
£E to = 5||s to || 2 < oo. Then from the boundedness assumption on N tito , 

||s t H|| 2 <M 2 to ||s to H|| 2 

for all w G and t G T, it follows that 

£E t = 5||s t || 2 < M 2 to 5||s to || 2 < oo 

for all i G T. Therefore, s t is a second-order 7d-valued random variable, with 
mean s t G 7d, covariance operator V t '■ H — > 7d, and 

5||s t || 2 = ||s ( || 2 +trP ( , 

for all t G T. Thus the principle of energetic consistency has been established: 

Theorem 1 Let 7d, 5, T and N t .t 0 be as stated in Section 2.1, with N t .t 0 
continuous and bounded for all t G T, and let £ be the expectation operator on 
a complete probability space (Q,E,P). Ifs t „ is a second-order 5 -valued random 
variable, then for all t G T , (i) s t = N tito (s to ) is a second-order H-valued 
random variable, (ii) E t = ||s t || 2 is a scalar random variable, (Hi) s t has mean 
St. & 'LL and covariance operator Pt :LL —* LL, (iv) 

£Et = ||st|| 2 + tr Vt, 

and (v) 

||s f || 2 +tr"P t < M 2 to (||s to || 2 +trP to ). (5) 

If, in addition, N tito is conservative, then (vi) 

llstf +tr "Pt = ||s to || 2 +trP to (6) 

for all t&T. 

It is in the conservative case that the principle of energetic consistency is 
most useful, because in that case, Eq. (6) provides an equality against which, 
for instance, approximate moment evolution schemes can be compared. In case 
N t , t 0 is only bounded, for example in the presence of dissipation, or for initial- 
boundary value problems with a net flux of energy across the boundaries, Eq. (5) 
still provides an upper bound on the total variance tr V t . 
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2.4 A natural restriction on S 

Suppose for the moment that s to is an 5-valued random variable, not necessarily 
second-order. When the squared norm on 7 i. represents a physical total energy, 
it is natural to impose the restriction that every possible initial state s to (w), 
u £ Q, has total energy less than some finite maximum amount, say E* < oo, 
i.e. that 5 C He,, where He is defined for all E > 0 as the open set 

He = {s £H : ||s|| 2 < E}. (7) 

Otherwise, given any total energy E , no matter how large, there would be a 
nonzero probability that s to has total energy greater than or equal to E: 

P({u! £ : IKHH 2 > E}) > 0. 

Of course, it can be argued that since this probability would be very small for 
E very large, it may be acceptable as an approximation not to impose this 
restriction. On the other hand, as discussed in Section 3.2 and illustrated in 
Section 4, for classical solutions of hyperbolic systems of partial differential 
equations, it is necessary to require that 5 C He, for some E, < oo just 
to ensure wcll-posedness. Thus the restriction is often not only natural, but 
also necessary. It also simplifies matters, as discussed next, for it makes s <0 
second-order automatically and gives s t = N t>to (s to ) some additional desirable 
properties, and it also yields a convenient characterization of s to and s t . 

Suppose that s to is an 5- valued random variable, and that 5 C He, for some 
E * < oo. Thus ||si 0 (u;)|| 2 < E * for all u € fl, and therefore 5||s to || 2 < E *, i.e., 
s to is a second-order 5-valued random variable. Therefore, for all t £ T, s* = 
Nc t 0 (sf 0 ) is a second-order H - valued random variable, in fact with s t (w) £He 
for all oj £ fl, where E = E* in the conservative case and E = M 2 to E+ in the 
merely bounded case. Since He is an open set in H, He G B(H). Therefore, 
for all t £ T, St is an Tt^-valued random variable. Further, for all p > 0 and 
t £ T, 51 1| < E p l‘ 1 . Thus ||s t || has finite moments of all orders, for all t £ T. 

Now suppose that s is an 7fE-valued random variable, for some E < oo. 
Then since s is also an H -valued random variable, (h;,s) is a scalar random 
variable for i = 1, . . . , N, where {Milr is any orthonormal basis for H and 
N = dimTf < oo. Since s(w) £ H for all u> £ fl, s(w) has the representation 

N 

S M = y^(hj,s(q;))hi 

i= 1 

for each to G and by Parseval’s relation, 

N 

ll s MH 2 = X]( h i, s (w)) 2 < E 

i - 1 


for each oj £ fl. 


7 



It is shown in Appendix A. 4 that if {sj}^ is any collection of scalar random 
variables with Y^iLi^ s i < 00 > where N = dim7d < oo, then there is a second- 
order 7d-valued random variable s such that (h*, s(w)) = Si(ui) for * = 1, . . . , N 
and for all u £ Cl with Y^f=i s i ( w ) < °°- Therefore, if {s,;}4i is any collection 
of scalar random variables with JA =1 s 2 (u>) < E for all u> £ 0, then there is 
a second-order 7d-valued random variable s such that (h,, s(w)) = s,( to) for 
i = 1, . . . , N and for all u £ Cl, in which case s (u>) = Y^iLi Si (oj 1 )! 1 , for all w £ Cl, 
and so by Parseval’s relation, this s is an Tf^-valued random variable. 

Thus, a map s : Cl — > 7d is an Tf^-valued random variable if, and only if, 

N 

s(u) = ^ Sj (a;)h» 

i= 1 

for all lu G fi, where {si}^L 1 is a collection of scalar random variables with 

N 

J2 s i(“)< E 

i = 1 

for all lo G fi, in which case 

Si(w) = (hj,s(w)) 

for * = 1, . . . , N and for all w £ Cl. In particular, |sj(w)| < E 1 / 2 for * = 1, . . . , AT 
and for all lo £ Cl, which is a strong restriction on the scalar random variables 
Si = (hj,s). It implies immediately that the probability distribution functions 

F( hi ,s){x) = P({lo £ Cl : (h i7 s(w)) < x}) 


must satisfy 


F(hi,s) {%) 


0 if x < —E 1 ! 2 

1 if x > E 1 / 2 


for i = Thus (h,, s) cannot be Gaussian-distributed, for instance, for 

any i = 1 Also, since ||s(w)|| < E 1 / 2 for all to £ Cl, the probability 

distribution function 


F \\ s \\(x) = p({u g : ll s MII < 4) 


of the scalar random variable ||s|| must satisfy 


F\\s\\ ( x ) 


0 if x < 0 

1 if x > E 1 ' 2 


The characterization of 7d ^-valued random variables given above will be used 
in Section 4 to construct an 7ds-valued random variable s to for the shallow-water 
equations. This will guarantee directly that the random initial geopotential field 
is positive. 



3 The principle of energetic consistency for dif- 
ferential equations 

3.1 Ordinary differential equations 

Consider a nonlinear system of ordinary differential equations 

/7c 

— + f (s, t) = 0, (8) 

where f : Si x 7) — » M. N , with Si an open connected set in M. N , possibly all of 
R N , and with 7) = [to,Ti] and 0 < T\ — to < oo. Take TL = R N , with (•,•) 
denoting the Euclidean inner product, (g, h) = g T h for all g, h G l w , and || • | 
the Euclidean norm, ||h|| = (h^fi) 1 / 2 for all h G R w . 

Assume that f is of class C(S i x 71), i.e. , that f is continuous on its domain 
of definition S± x 7 ) . Assume also that f is bounded on Si x 7) : 

l|f(M)|| <Ci 

for some constant C \ , for all s G Si and t G 71. Note that the latter assumption 
follows from the former one if Si C He for some E < oo, where TL e was defined 
in Eq. (7). Assume finally that f is Lipsclritz continuous in its first argument, 
uniformly in time, i.e., that there is a constant C 2 such that 

||f( r ,*) -f(s,t)|| < C 2 ||r — s|| 

for all r, s G 5i and t G T\ . 

A real N - vector function s = s(f) defined on an interval T* = [to,T*], T* G 
(to,Ti], is called a (continuous) solution of Eq. (8) if, for all t G %, (i) s (t) G Si, 
(ii) s(t) is continuous, and (iii) s (t) satisfies Eq. (8) pointwise. It follows from 
the continuity assumption on f that if s is a solution on an interval 7), then 
ds/dt is continuous on %, and so 

^IMI 2 = ^( s ’ s ) = = - 2 ( s > f ( s ^)) ( 9 ) 

is also continuous on Z*, hence integrable on 7 ). Similarly, if r and s are two 
solutions on an interval 7),, then by the Schwarz inequality, 

ll r ®ll dt = |(r-s,f(r,t) -f(s,t))| < ||r-s|| ||f(r,t) -f(s,t)|| 

for all f G T», and so by integrating it follows from the Lipschitz continuity 
assumption that 

IK*) - s (*)ll < e C 2 (t_to) ||r(t 0 ) - s(i 0 )|| 

for all i G T,. Thus, if r (< 0 ) = s (to) then r (t) = s (t) for all t G T*: for each 
s to G Si there exists at most one solution s(f) defined on an interval 7),, such 
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that s(to) = St 0 . The inequality also shows that if such a solution exists, then 
it depends continuously on s to , for alH G % . 

The continuity and boundedness assumptions on f together imply that, for 
each s to G Si, there does exist a solution s(t) with s (to) = s to , and that it 
remains in existence either until time t = T\ or the first time that the solution 
hits the boundary dS± of <Si , where f may not be defined, whichever is smaller 
(e.g. Coddington and Levinson (1955, pp. 6, 15)). Thus, if 5i = l iV , then the 
solution exists until time T\. This time can be arbitrarily large, for instance if f 
is independent of time. More generally, a minimum existence time can be found 
by noting that the solution s(t) with s (t 0 ) = s to G Si must satisfy the integral 
equation 

s(t)=s t„- [ f(s (r),r)dr, 

Jto 

and so 

l|s(i) — s t o|| < Ci(t-t 0 ) 

by the boundedness assumption on f , for as long as the solution exists. Denoting 
by p(st 0 ) the Euclidean distance from any s to G Si to dS i, 

P( s io)= u inf ||h- s t J, 

it follows that ||s(t) — s to || < p( s to ) if t — to < p(s to )/Ci, and so the solution 
exists on %, = [fo,T*] for 

T* = T*( s to ) = min(Ti, t 0 + p(s to )/Ci). 

Note that p(s to ) > 0 for each s to G Si since 5i is an open set, and therefore 
T* > to . 

The principle of energetic consistency requires a set S G B(H) = B(R N ) for 
the initial data and a time interval T = [to,T] such that, for every s to G S, the 
corresponding solution exists on T, i.e. , every solution must exist for the same 
minimum amount of time T — to > 0, independently of the location of s to G S. If 
Si = R^, then take S = and T = [to, Ti]. Otherwise, let S be any open set 
in R w which is contained in the interior of <Si, and denote by ps the minimum 
Euclidean distance from the boundary of S to that of Si. Then 

P(st„) > PS = inf p(s) > 0 

sGo 


for all s io G S , and setting 


T = T S = min(Ti,f 0 + ps/Ci) 

and T = T(S) = [to,T], it follows that the unique solution s(f) corresponding 
to each s to G S exists for all t G T. Denoting this solution by s t = N tito (s to ), 
it follows that N tito is defined uniquely on S , as a continuous map from S into 
H =R N , for allt’eT. 
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It follows from Eq. (9) that the solution operator N tjto is conservative if 
(s, f(s, £)) = 0 for all s £ S\ and t £ T. More generally, it follows that if there 
is a constant C 3 such that 


|(s, f(s, t))\ < C' 3 ||s || 2 

for all s£i?i and t € T, then N tjto is bounded, with 

||s t || = ||N t . to (s to )||<e c ^o)|| Sto || 

for all t £ T. Note that if f(s,f) depends linearly on s near the origin of 
coordinates 0 £ I ff , or if 0 ^ S\, then by the boundedness assumption on f 
there is a constant C 3 such that ||f(s, t))|| < <^ 3 1 1 s 1 1 for all s £ Si and t £ T, so 
that by the Schwarz inequality, 

I (s, f(s, t))\ < ||s|| ||f(s,t))||<C 3 ||s || 2 

for all s £ Si and tgT, and therefore N fito is bounded for all t £ T. 


3.2 Symmetric hyperbolic partial differential equations 

3.2.1 The deterministic initial- value problem 

Consider now a nonlinear system of partial differential equations 

f) c 

— + Gs = 0, (10) 

where G = G(s) = G(s,x, t) is a linear differential operator of first order in 
space variables x = (x \, . . . , Xd) T , 


G = X] A i(s ,x,t) 
3 = 1 


d 

dXn 


^-d+1 


(s,x,£), 


and Ai, . . . , A^+i are real n x n matrices. For simplicity assume that the d- 
dimensional spatial domain D of the problem is 

D = {x G R d : \xj\ < Lj , j = 1 


with periodic boundary conditions at the endpoints Xj = ±L J; j = 1 
Consider endpoint x 3 = Lj to be identified with endpoint Xj = —Lj, for each 
j = 1, . . . , d, so that a continuous function on D satisfies the periodic boundary 
conditions automatically. (Spherical geometry will be treated in Section 4.) 
Take H = L 2 (D), the Hilbert space of real, Lebesgue square- integrable n- vectors 
on D, with inner product 


(g,h) 


g T ( x )h(x) dx\ ■■■dx d 


id 


11 



for all g,h £ X 2 (D), and corresponding norm ||h|j = (h, h) 1 / 2 for all h € L 2 (D). 

Assume that the matrices Ai, .... A^+i are defined on all of R" x D x 7), 
where 7) = [to , T\ ] and 0 < T\ — to < oo. Assume further that each matrix 
is of class C^R” x D x X[), i.e. , that all of the matrix elements and all of 
their partial derivatives are continuous functions on R” xflxTi and satisfy 
the periodic boundary conditions in the space variables. Assume finally that 
Ai, .... Ad (but not Ad+i) are symmetric matrices. 

A real n-vector function s = s(x, t) defined onfixT,, with % = [to, T*\ and 
T* G (t-o,Ti], is called a classical solution of the symmetric hyperbolic system 
Eq. (10) if (i) s G C 1 (D) n C' 1 (Z t ) and (ii) s satisfies Eq. (10) pointwise in 
D x X). The condition s € C ,1 (I?) fl C ,1 (X)) means that the components of 
the vector s and their first time and space derivatives are continuous on D for 
each fixed t € X), are continuous on X* for each fixed x G D, and satisfy the 
periodic boundary conditions. The initial condition for a classical solution is a 
real n-vector function s to G C 1 ) D ). 

Suppose for the moment that s = s(x, t) is a classical solution on D x T,. 
Then 

rj rl 

-||s|| 2 = -(s, s) = 2(s, -) = — 2(s, G(s)s) (11) 

is continuous on X). Also, by using the symmetry of Ai, .... A<j and the periodic 
boundary conditions, an integration by parts gives 

(s, G(s)s) = / s T (x, f)B(x, t)s(x, t) dx\ ■ ■ ■ dxd (12) 

J D 

for all t G %, where 


B(x, t) 


A<j+i(s, x, t) 


\ , dA j (s, x, t) 

2 ^ ~chTj 

j=i J 


and 


dAj dA j dsi 

dxj ^ dsi dx-j 

J t—1 J 


dAj_ 

dxj ' 


Further, since s G C l (D) fl G 1 (X>), the components of s and their first partial 
derivatives with respect to the space variables are bounded functions onfixT,. 
Define (3 0 = Po{s) by 

n 

Po = max V |s,:(xX)l> 

DxT* z ' 
i = 1 


and / 3j = /3j{ s) by 

dSj(xL,t) 
dxj 

for j = 1 ,...,d. Then it follows from Eq. (12) and the continuity assump- 
tion on the matrices Ai, . . . , Ad+\ that there is a continuous function C\ = 
Ci(/3q, . . . , Pd) such that 

|(s,G(s)s)|<C 1 ||s|| 2 


Pi = max > 

J DxT, ' 
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for al 1 f £ T,. Equation (11) then implies that 


||sM)||<e Cl ( t - t °)||sMo)|| 


( 13 ) 


for al HeT,. 

A similar argument shows that if r and s are two classical solutions onDxT,, 
then there is a continuous function C- 2 = C 2 {i3o{r), • • ■ r),/? 0 (s), . . . , /3d( s)) 

such, thctij 

||r(.,t)-s(.,t)||<e c ^)||r(.,to)-s(.,to)|| (14) 

for all tsT». Therefore, for each s to G C l {D ), there exists at most one classical 
solution sonflx T* such that s(x, to) = s t 0 ( x ) for all x G D. This inequality 
does not imply that if such a solution exists, then it depends continuously on 
s to in the norm || • ||, unless C 2 can be made to depend only on r to and s to . This 
is accomplished by means of the existence theory itself, discussed next. 

Denote by H k = H k (D), for k = 0, 1, . . the Sobolev space of real n- vectors 
on D with k Lebesgue square-integrable derivatives on D. The spaces H k are 
Hilbert spaces, with inner product 

k 

(g,hV=E E ( Dl g’ D ‘ h ) 

1—0 Z i H 1 -Id— l 


for all g, h G H k , where 


dx l i ■ ■ ■ dx l £ ’ 

and corresponding norm ||h|j ff ic = (h, h)^ for all h G H k . Note that H rn C 
H k C H° = Ti foi' 0 < k < m. The Sobolev lemma (e.g. Kreiss and Lorenz 
(1989, Appendix 3, pp. 371 387)) says that if h G H k and k > [|] + 1, where [j/] 
denotes the largest integer less than or equal to y, then h is a bounded function 
on D 1 with bound 

n 

maxE I h>i ( x ) I < CK fe ||h|| , (15) 

xG-D z ' 
i— 1 

where the constant a,k depends on L\,...,Ld but not on h. It follows that if 
h G H k and k > [f ] + l + 1 for some positive integer l, then all of the Z th -order 
partial derivatives of h are bounded functions on D , with bound 

n 

maxE \D l hi(x.)\ < a k \\h\\ H k, (16) 

i = 1 

and in particular, h G C l ^ 1 (D), since otherwise the Z th -order partial derivatives 
of h are not defined as bounded functions. Thus, for any nonnegative integer l, 
H k = H k (D) c C l (D) if k > [|] + l + 2. 

Suppose now that s to G H k with k > [|] + 3. According to the existence 
theory for linear and quasilinear symmetric hyperbolic systems (e.g. Courant 
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and Hilbert (1962, pp. 668-676)), there is a time interval 7), C T\ for which 
Eq. (10) has a solution s € H k n C 1 ( %) with s (x, to) = s t 0 ( x ) for all x € D, 
which is the classical solution since H k C C 1 {D), and the solution remains in 
existence as long as t < T\ and s (•,<) € H k . This is completely analogous to 
the situation for ordinary differential equations: the first time t such that the 
solution s(* , t) qL H k , if such a time is reached, is the first time the solution hits 
the “boundary” of H k , ||s(-,f)|| H fc = oo. Typically the hrst partial derivatives 
of the classical solution become unbounded in finite time, even if s to £ C°°(D) 
(e.g. Lax (1973, Theorem 6.1, p. 37)). 

A minimum existence time for the solution s £ H k fl C 1 ( %), k > [|] + 3, 
can be found in the following way. For any s £ H k fl C' 1 (7),), 

= “2(s, G(s)s ) H k 

is continuous on 7),, as in Eq. (11). An integration by parts using the symmetry 
of the matrices Ai, . . . , Ad, along with the Sobolev inequalities Eqs. (15, 16), 
shows that there is a function </> £ C 1 ([0, oo)) such that 

I ( s , G(s)s) ff fc | < 0(||s||jjfc)||s||jyfc; 

see Kreiss and Lorenz (1989, pp. 190-196) for details. It follows that the solution 
s(-,t) exists in H k as long as t < T\ and the solution y(t) of the ordinary 
differential equation dy/dt = <f>(y) with y(to) = ||s to ||#fc remains finite. Further, 
there is a time T 2 > to depending continuously on |js io || ff fc, T 2 = IZ 2 ( 1 1 s* 0 1 1 t ) < 
7\, for which ||s(-, t)\\ H k can be bounded in terms of ||s to || ff fc, say 

ll s (’) t)\\H k < ||St 0 ||jyfc, 

for all t £ [to, T 2 ] (e.g. Kreiss and Lorenz (1989, Lemma 6.4.4, p. 196)). Then 
by the continuity of T 2 (||s io ||#fc), it follows that if s (o is restricted to be in any 
bounded set in H k , say if s to £ i7| for some E < 00 , where 

H k ={h£H k :\\h\\ 2 Hk <E}, 

then T 2 becomes independent of s to (but depends on E), and the solution s 
corresponding to any s to £ H satisfies 

\\s(;t)\\ 2 Hk <2E 

for all t £ [to, T 2 ]. Also, since H% is open as a set in H k , and since ||h|| < ||h||^fc 
for all h £ H k , is open as a set in L 2 (D ), and therefore H% £ B(L 2 (D)). 

3.2.2 The solution operator 

Thus take S = for any E < 00 and k > [|] + 3, and take T = T{S) = 
[to,T 2 ]. Then S £ B(H) = B(L 2 (D)), and the unique classical solution s(-,t) 
corresponding to each s to £ S exists in H k E C H k C Tt = L 2 (D) for all t £ T. 
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Denoting this solution by s t = N tito (s to ), it follows that N t)to is defined uniquely 
on S, as a map from S into H, for all t £ T . Further, since s f £ H k E for all 
t £ T, it follows from the Sobolev inequalities that the function C 2 in Eq. (14) 
depends only on E and ak, and therefore the map N t .t 0 is continuous in the 
norm || • ||, for all t £T. Note that S = H E C He, where He was defined in 
Eq. (7), since ||h|| < ||h|| ff fc for all h € H k . It was necessary to define S as a 
bounded set in H k , and therefore as a bounded set in L 2 (D). 

The solution operator N t to is bounded not only as a map from S into H, 
with the function C\ in Eq. (13) now being a constant depending only on E and 
but also as a map from S into H k , with 

Il s t||_f/fc = l|N t)to (s to )|| ff fc < \[2 ||s to || ff fc 

for all t £ T. According to Eq. (11), the solution operator is conservative if the 
differential operator G is skew-synnnetric, 


( s , G(r)s) = 0 

for all r(-, t), s(-, t) £ H k and t £ T. This conservation condition is met for 
an important class of symmetric hyperbolic systems (Lax (1973, p. 31)), but 
often a change of dependent variables which destroys symmetry of the matrices 
Ai, .... A d is necessary to obtain conservation in H = L 2 {D), as will be the 
case for the shallow-water equations. 

It has been shown that, for each s to £ S = H E , with E < 00 , k > [|] + l + 2 
and l > 1, the unique corresponding solution s = s(x, t) is of class C l (D) fl 
C ,1 (T), for T = and an appropriately defined T depending on ak and E, 

and that ||s(-,£)||^ fc < 2 E for all t. £ T. It is important to have a condition 
to guarantee further that s € C l {D x T), particularly for the shallow- water 
example. To this end, denote by L 2 (D x T) the Hilbert space of real, Lebesgue 
square-integrable n-vectors on D x T, with inner product 

(g, h)r = f (g, h) dt 

Jt 0 

for all g, h £ L 2 (D x T), and corresponding norm ||h|| 7 - = (h, h)^ 2 for all 
h £ L 2 (D x T). Also, denote by H m (D x T), for m = 0, 1, . . ., the Sobolev 
space of real n-vectors on D x T with m Lebesgue square-integrable mixed 
space and time partial derivatives on D x T, with the Sobolev inner product 
and norm. Thus, for any nonnegative integer Z, H m (D x T) C C l (D x T) if 
m > [^-] +1 + 2. Now, the differential equations Eq. (10) can be used to 
express all mixed space-time partial derivatives of the solution up to any order 
m in terms of pure spatial partial derivatives up to order m. But 



HsM)ll 2 ff * 


dt < 2 E(T — t-o) < 00 


since ||s(-,£)||^ fc < 2 E for all t £ T, and therefore s £ H k (D x T). Thus, for 
each s to £ S = with E < 00 , k > {^-} + 1 + 2 and l > 1, the unique 
corresponding solution s = s(x, t) is of class C l (D x T). 
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3.2.3 The stochastic initial-value problem 

With H = L 2 (D), 5 = H%, E < oo, k > [f ] + 1 + 2 and l > 1, let T = [t 0 , T] 
and N t)to be as defined in Section 3.2.2, let t £ T, and suppose that s to is 
an 5-valued random variable. Since 5 C He, it follows from the discussion of 
Section 2.4 that s to is a second-order 5-valued random variable. Therefore by 
Theorem 1, s t = N tjto (s to ) is a second-order 7d-valued random variable, with 
mean s t £ 7i and covariance operator Vt : H — > 7t, which are related by 

\\s t \\ 2 +tvV t <e 2G ^- t0 \\\s t0 \\ 2 +tvV t0 ), 

where C± is the constant in Ecp (13). In fact, 

KHH 2 < ||s t (w)||^ fc < 2||s to (a;)||^ fc < 2E 

for all bj £ O, and so s t is a second-order ff fc -valued random variable with 

£\\s t \\ 2 = \\s t \\ 2 +tvV t <£\\s t \\ 2 Hk <25||s t0 ||^ <2E. 

The covariance operator Vt. can be expressed in the following tangible way, 
which will lead also to a simple expression for tiVf. Since Vt is a trace class 
operator, Vt is also a Hilbert-Schmidt operator. Since H = L 2 (D) and Vt ■ 
H — * H, it follows (e.g. Reed and Simon (1972, Theorem VI. 23, p. 210)) that 
there is a real n x n matrix function P t £ L 2 (D x D ), called the covariance 
matrix of s i? such that 


(P t h)(x) = [ P*(x, y)h(y) dy 
J D 


ID 

for all h £ 7d, where dy = dy i • • • dyd, and moreover, that 

OO 

trP t (x,y)Pf (x, y) dxdy = ^ A f{t) < oo, 


D J D 


i= 1 


(17) 


where tr A denotes the trace of a matrix A and are the eigenvalues 

of the covariance operator Vt ■ Thus 


£(g,s t -s t )(h,s t -s t ) = (g,V t h) = [ f g T (x)Pt(x, y)h(y) dxdy (18) 

J D J D 

for all g, h £ H. Since Vt is self-adjoint, the covariance matrix P t has the 
symmetry property Pf (x, y) = P t (y, x) for all x, y £ D. 

Now let {hj(-,t)}g. 1 denote the orthonormal eigenvectors (eigenfunctions) 
of Vt corresponding to the eigenvalues {A^t)}?^, 

V t h 

for i = 1,2,.... The eigenvalues are all nonnegative since the covariance operator 
is positive semidefinite, and the eigenvectors form an orthonormal basis for 
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TL since the covariance operator is Hilbert-Schmidt. From Eq. (18) and the 
orthonormality of the eigenvectors, it follows that 


hf (x, t)P t (x, y)hj (y, t) dx dy 


D J D 




for i,j = 1,2,..., where Sij is the Kronecker delta. Therefore, P f has the 
representation 

OO 

P*(x, y) =^\i(t)hi{x.,t)hj(y,t), (19) 

i—l 

where the convergence is in L 2 (D x D) as indicated in Eq. (17). Since the 
eigenvectors form an orthonormal basis for TL and since Vt is trace class, 


OO 

tr V t = ^2 M*) < 00 ■ 


But according to Eq. (19), 


tr P t(x, x) = "22 A ! :(t)hf(x,t)h, ; (x,t), 
2=1 


and therefore 

r. OO 

/ tr P t (x, x) dx = ^ A,; (t) 

Jd i= i 

by the normality of the eigenvectors. Thus, 

tr Vt — / trP t (x, x) dx. (20) 

Jd 

Now recall that s t is a second-order if -valued random variable. Therefore 
s t G H k , Vt. maps H k into H k , and 

£|l s t||.sffc = II s * II + tr Vt- 

Also, st G C l (D) since H k C C l (D). Further, since Vt maps H k into H k , the 
eigenvectors of Vt form an orthonormal basis for H k , and therefore they are all 
in C\D). 

Finally, let {h,}?^ be an orthonormal basis for H k . Since s to is an H 1,- 
valued random variable, it follows from the discussion of Section 2.4 that 

OO 

StoM = ^2(h t ,s t 0 (u)) H khi 
i=l 


for all u) G Cl, with 


ll s *o ( w ) \\~H k — ^ ' (h-O s t 0 (^))fffc < E. 
i= 1 
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for all u) £ fi. It follows also that if is any collection of scalar random 

variables with 

OO 

< E 

i = 1 

for all ui £ Cl, then s?:(w)h,; is an ffj|-valued random variable. 


4 The shallow-water equations 


The global nonlinear shallow-water equations written for the zonal and merid- 
ional velocity components and geopotential, u, v and $, respectively, are the 
system 

du u 19$ 

+ V • Vir — (/ H tan <j>)v H r wr = 0 


dt 


a cos <j> 9 A 

dv , . _ , „ u , . 19$ 

— + V • Vu + {f + - tan 4>)u + - — = 0 

ot a a d(f> 


9$ 

~dt 


V-V$ + $V-V = 0, 


where V is the wind vector, <j> the latitude, A the longitude, a the Earth radius, 
and / the Coriolis parameter. The change of variable $ = w 2 / 4 yields the 
symmetric hyperbolic system 


9s 

dt 

where s = (it, v, w ) T , 


+ 


1 


9 


a cos (j> d\ 


B-^ + C 

a ocp 


s = 0, 


( 21 ) 


A = 


B = 


u u 
0 u 
\w 0 


\w 

0 

u 


V 

0 

0 


0 0 


\w 


C = 


0 


/ + f tan (j) 

0 




-(/+ftan(/>) 0 

0 0 

— 5 ^ tan <j> 0 


Now let H = L 2 (S), the Hilbert space of real square-integrable 3- vectors on 
the sphere of radius a, with inner product (•,•) and corresponding norm || • |j. 
Appendix B.2 establishes a Sobolev-type lemma for the family of Hilbert spaces 
{$ p = $ p (S),p> 0}, 

$ p = {h e L 2 (S) : ||(7 — A) p h|| < oo}, 
where A is the Laplacian operator on the sphere, with inner product 

(g, h) p = ((/ — A) p g, (7 — A) p h) 
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for all g,h £ <f> p , and corresponding norm ||h|| p = (h, h) p ^ 2 for all h G <f> p . 
Thus if h G <f> p and p is a positive integer or half-integer, then all partial 
derivatives of the components of h up to order 2 p are square-integrable. The 
spaces <f> p are convenient for spherical geometry since the Laplacian operator is 
coordinate-free. The existence and uniqueness theory of Section 3.2 based on 
Sobolev spaces and inequalities carries over to the sphere using the spaces 
for integers and half-integers p. 

Tentatively let 

5 = {h G $ 2 : || h|| 2 < E} 
for some E < oo. It follows from Appendix B.2 that 

5c$ 2 cC 1 (S), 


as expected from Section 3.2.1. It follows from Section 3.2.2 that for the sym- 
metric hyperbolic system Eq. (21) there is a time interval T = T(S) = [to,T] 
such that, corresponding to each s (o G <S, there exists a unique classical solution 
s t G $2 for all t G T, and further that sgC'(SxT), 

However, since w = 2v / l>, this solution does not solve the original shallow- 
water system unless w > 0 on S x T. The differential equation for w is 

dvo 

+ V • Vw + bw\7 • V = 0, 
at z 

and therefore along the curves x = x(t) = (A(t), </>(f)) defined by 


dx 

dt 


V(x, f), 


the solution w satisfies the ordinary differential equation 


( 22 ) 


dw 

dt 


\vN • V = 0. 


This guarantees that if w > 0 initially, then w > 0 for all (gT. Thus redefine 
S as 

cS = {h G $ 2 : || h|| | < E} D {(u, v, w) G $2 : w > 0}. 

Note that the latter set is open in L 2 {S) since it is open in $ 2 , and therefore 
S G B(L 2 (S)) since the intersection of two open sets is open. Also note that 
the initial- value problem for Eq. (22) is well-posed since V GC'(Sx T). 

The classical solutions of the shallow- water equations satisfy the energy equa- 
tion 

^^(■u 2 + v 2 ) + $ 2 ] + V • {[$(u 2 + v 2 ) + 2<f> 2 ]V} = 0. 

This suggests introducing a new set of dependent variables, the energy variables 
s = (cx,/3, $) T with a = ltd* 1 / 2 and /3 = u'b 1 / 2 . In the energy variables, the 
physical total energy is just 5 l|s|| 2 , and it is conserved. It can be verified that 
in terms of the energy variables, the shallow-water system can be written as 


S + GS ^ 0 ’ 


(23) 
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where G = G(s) has the form 


_ A 1 d 1 d 11 (dA 3B cos 0 \ _ 

G — A T^TT + B — aT + o T ( "oT ^ nT ) + ^ 

a cos 0 o\ aoq> 2 a cos <p \o A aq> ) 

with 

- Q $ — 1/2 0 | 

A = 0 a$-!/ 2 ‘ 0 

t ^ 1 / 2 o fc ^” 1 / 2 

L 5 5 

' /3$-V 2 0 0 

B = 0 z?®- 1 / 2 |$i/2 

0 f^ 1 / 2 f^- 1 / 2 _ 

0 — (/ + tan0) 0 

C= / + tan 0 0 |i$ 1 / 2 tan0 

0 — 1 1(f) 1 / 2 tan 0 0 

L 5a r 

For the system (23) to yield the solution of the original shallow- water system 
requires being able to recover u = a$ -1 / 2 and v = /3<I>~ 1 / 2 from a, (3 and $. 
Now, products of scalars in $2 are also scalars in $2 since the elements of <I>2 are 
all bounded, continuous functions. But is not in $2 unless $ is bounded 

from below by a positive constant. Thus for the energy variables, the initial 
space S is defined as S = <S 7 , where 

S 7 = {h G $ 2 : 1 1 h- 1 1 2 < E} l~l {(«,/?,$) G $ 2 : $ > 7} 

for some constant 7 > 0. It follows for the energy variables that for all t g T, 
the unique solution s t corresponding to each s to G <S 7 is in S$ for some constant 
S > 0. The symmetry of the matrices A and B, the skew-symmetry of the 
matrix C, and the form of the differential operator G imply immediately that, 
for all S > 0, G is a skew-symmetric operator on Ss'- 

(s, G(r)s) = 0 

for all r,s G Ss . This of course implies energy conservation, as noted in Section 
3.2.2. 

Now suppose for the energy variables that s to is an <S 7 -valued random vari- 
able. Thus St 0 is also second-order, and it follows from Theorem 1 that s t = 
N t; ,t 0 (s t , 0 ) is a second-order T 2 ( S') -valued random variable, with mean s t G L 2 {S) 
and covariance operator Vt ■ L 2 (S) — > L 2 (S), which are related by 

INI 2 + tr Vt = ||s to || 2 + tr Vt 0 < E 

for all i G T. The results of Section 3.2.3 show further that, for all I G T, 
s t G $ 2 C C\S), 

£ II s * ||“ = II s * II 2 + tr Vt < £||s7 H2 < E, 
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and 


tr V t = 


tr P t (x, x) a cos q i d(f> d\, 


where P ( is the covariance matrix of s t . 

The discussion at the end of Section 3.2.3 gives the general form of <f> 2 -valued 
random variables with bounded $ 2 norm. The Sobolev-type inequality Eq. (35) 
of Appendix B.2 then suggests a way of ensuring that such a random variable 
s to is an <S 7 -valued random variable. Let 




for all to £ $7. The series expansions in Appendix B.2 give the form of every 
scalar in $ 2 . Suppose that $ to € 4 > 2 with $ to > fi > 7 > 0. Equation (35) 
shows how to ensure that (w)| < [i — 7 for all u> £ O, and therefore that 
$t 0 (w) > 7 for all well. 


A Random variables taking values in Hilbert 
space 

Appendix A.l defines Hilbert space- valued random variables and gives some of 
their main properties. Appendices A. 2 A. 4 give the definition, main properties 
and general construction, respectively, of Hilbert space- valued random variables 
of second order. Definitions of basic terms used in this appendix are provided in 
Appendix C. Further treatment of Hilbert space-valued random variables, and 
of random variables taking values in more general spaces, can be found in the 
books of Ito (1984) and Kallianpur and Xiong (1995). 

Hilbert space- valued random variables, like scalar random variables, are de- 
fined with reference to some probability space (fl,LF,P), with O the sample 
space, T the event space and P the probability measure. Thus throughout this 
appendix, a probability space (f 1,LF,P) is considered to be given. The expecta- 
tion operator is denoted by £. It is assumed that the given probability space is 
complete. 

A real, separable Hilbert space TL is also considered to be given. The inner 
product and corresponding norm on TL are denoted by (•, •) and || • || , respectively. 
The Borel field generated by the open sets in TL is denoted by B(TL ), i.e. , B(TL) 
is the smallest cr-algebra of sets in TL that contains all the open sets in TL. Recall 
that every separable Hilbert space has a countable orthonormal basis, and that 
every orthonormal basis of a separable Hilbert space has the same number of 
elements N < 00 , the dimension of the space. For notational convenience it is 
assumed in this appendix that TL is infinite-dimensional, with denoting 

an orthonormal basis for TL. The results of this appendix hold just as well in 
the finite-dimensional case, by taking {hjjA.j, N < 00 , as an orthonormal basis 
for TL, and by replacing infinite sums by finite ones. 
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A.l 7i - valued random variables 

Recall that if X and Y are sets, f is a map from X into Y, and B is a subset of 
Y, then the set 

f -1 [-B] = {x 6 I : f(x) G B} 

is called the inverse image of B (under f). Recall also that the event space T of 
the probability space (Cl, T ', P) consists of the measurable subsets of Cl, which 
are called events. 

Let (Y,C) be a measurable space, i.e., Y is a set and C is a cr-algebra of 
subsets of Y. A map f : Cl — > Y is called a ( Y,C)-valued random variable if the 
inverse image of every set C in the collection C is an event, i.e., if f _1 [C] G T 
for every set C G C (e.g. Ito (1984, p. 18), Kallianpur and Xiong (1995, p. 86); 
see also Reed and Simon (1972, p. 24)). 

Thus an (H, £>(7Y))-valued random variable is a map s : Cl — > H such that 

{well: s(u>) G B} G T 

for every set B G B(H). Hereafter, an alued random variable is 

called simply an 7f-valued random variable, with the understanding that this 
always means an (7i, £>(?f))-valued random variable. An equivalent definition 
of 7Y-valued random variables, expressed in terms of scalar random variables, is 
given in Appendix A. 2. 

Let S be a nonempty set in B(H). It follows that the collection Bs(H) of 
all sets in B(H) that are subsets of S , 

B S (H) ={Bg B(H) :BcS}, 

is a cr-algebra of subsets of S , namely, the collection of all sets C of the form 
C = B n S with B G B(H). Hence {S,Bs(H)) is a measurable space, and an 
(6>, £>s(7-0)-valued random variable is a map s : Cl — > S such that 

{wed: s (u>) G C} € T 

for every set C G Bs(H). Hereafter, an (S, Bs(Ti ))- valued random variable is 
called simply an 5-valued random variable, with the understanding that this 
always means an (5, Hs(7i))-valued random variable. 

It follows by definition that every 5-valued random variable is an Ti - valued 
random variable, for if s : Cl — » 5 and s” 1 [C] G T for every set C G Bs(Tt), then 
s -1 [R] = s _1 [Bn5] G T for every set B G B(H). Also, every 7-f-valued random 
variable taking values only in 5 is an 5-valued random variable, for if s : Cl — > 5 
and s _1 [H] G T for every set B G B(H), then in particular s -1 [C] G T for every 
set C G BsiH). 

Finally, let N be a continuous map from 5 into TL. It follows that if s is an 
5-valued random variable, then N(s) is an 7-f-valued random variable, i.e. that 

{wGd: N(s(w)) eB}£P 
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for every set B G B(H). To see this, note first that 

{uj G n : N(s(w)) G B} = s^pST 1 ^]], 

and consider the class of sets E in H such that N” 1 ^] G Bs(Tt). It can be 
checked that this class of sets is a er-algebra. Moreover, this class contains all 
the open sets in T~i, because if O is an open set in H then N -1 [Oj is also an 
open set in H by the continuity of N (e.g. Reed and Simon (1972, p. 8)) and so 

C = N _1 [0] = N _1 [0] ns G B S (H). 

But B(H) is the smallest cr-algebra containing all the open sets in H, hence 
this class includes B(H), i.e., N _1 [_B] G Bs(H) for every set B G B( H). If s is 
an 5-valued random variable then s _1 [C] G T for every set C G Bs( H), and 
therefore s _ 1 [N _ 1 [.B]] G T for every set B G B(H ), i.e., N(s) is an 7Y-valued 
random variable. 

A. 2 Second-order 7i - valued random variables 

If s is an 7i-valued random variable and h G TL. then by the Schwarz inequality, 

l(h,s(w))| < ||h||||s(a;)|| <oo (24) 

for all to G fl, so for each Hxed h G TL, the inner product (h, s) is a map from 
into R. In fact, it can be shown (e.g. Kallianpur and Xiong (1995, Corollary 
3.1.1(b), p. 87)) that a map s : Q — > TL is an 7d-valued random variable if, and 
only if, (h, s) is a scalar random variable for every h G H. That is, a map 
s : Q — > TL is an 7i-valued random variable if, and only if, 

{uj G fl : (h, s(w)) < a} G T 

for every h G 7i and every aGl. 

It follows that if s is an 7-f-valued random variable, then ||s|| 2 is a scalar 
random variable, that is, 

{u> G Ci : ||s(w)|| 2 < a} G T 

for every a G R. To see this, observe that if s is an 7-f-valued random variable, 
then (hi, s) for * = 1,2,... are scalar random variables, hence 

n 

Sn — ^ ^ i s) 
i=l 

are scalar random variables with 0 < s n < s n + 1 for n = 1,2,..., and by Parse- 
val’s relation, 

oo 

ll s (^)H 2 = y^(hj,s(a ;)) 2 = lim s n (w) 

z ' n — >00 

2=1 

for all uGl 2. Thus ||s|| 2 is the limit of an increasing sequence of nonnegative 
scalar random variables, and is therefore a (nonnegative) scalar random variable. 

If a map s : 12 — > TL is an Ti-valued random variable, then since ||s|| 2 > 0 is a 
scalar random variable, it follows that 5|js|| 2 is defined and either £||s|j 2 = oo or 
£||s|| 2 < oo. An H - valued random variable s is called second-order if £||s|| 2 < oo. 
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A. 3 Properties of second-order Ji- valued random variables 

In this subsection let s : Q — > TL be a second-order 7-f-valued random variable. 
Since £ ||s|| 2 < oo, it follows from Eq. (24) that 

£(h, s) 2 < ||h|| 2 £||s|| 2 < oo (25) 

for each h G TL. Thus, for each h £ TL, (h, s) is a second-order scalar random 
variable, and therefore its mean is defined and finite. The mean of (h, s) will be 
denoted by 

m[h] = £ (h, s), 

for each h € TL. Since £ ||s|| 2 < oo, ||s|| is a second-order scalar random variable, 
and its mean M = £ ||s|| satisfies 0 < M < (fUsH 2 ) 1 ^ 2 < oo. Now 

|m[h]| = |£(h, s)| < £ | (h, s)| < M||h|| 

for each h G TL, by Eq. (24), and also 

m[a-g + /3h] = a?Ti[g] + /3m [h] 

for each g, h g TL and a, (3 £ M. Thus m[] is a bounded linear functional on TL, 
and by the Riesz representation theorem for Hilbert space (e.g. Royden (1968, 
p. 213), Reed and Simon (1972, p. 43)) this implies that there exists a unique 
element s £ TL, called the mean of s, such that 

m[h] = (h,s) 

for each h e TL. Thus the mean s of s is defined uniquely in TL, and satisfies 
(h, s) = £(h, s) for every h g H. 

Now let s'(w) = s(w) — s for each lo £ fi. Since 

||s , (w)|| < ||s(w)|| + ||s|| < 00 

for each lo £ £1, s' = s — s is a map from Q into TL. Furthermore, for every 
hew, (h,s) is a scalar random variable, |(h,s)| < oo, and |(h, s(o>))| < oo for 
each to £ H. Therefore (h, s') = (h,s) — (h, s) is a scalar random variable for 
every h € TL, and hence s' is an 7i-valued random variable. Also, 

£(h, s') = £(h,s) — (h,s) = 0 

for every h <G H, so the mean of s' is 0 £ TL. Thus 

£|!s|| 2 =£(s + s',s + s') = ||s|| 2 + fils'll 2 (26) 

and, in particular, £ ||s'|| 2 < £ ||s|| 2 < oo. Therefore s' : H — ■> TL is a second-order 
7Y-valued random variable, and ||s'|| is a second-order scalar random variable. 

Since s' is a second-order 7Y-valued random variable, (g, s') and (h, s') are 
second-order scalar random variables, for each g, h <G TL. Therefore the expec- 
tation 

C[g,h] = £(g,s')(h,s') 


24 



is defined for all g, h G 7~i, and in fact 

|C[g,h]| < £|(g,s')(h,s')| < [f(g,s') 2 ] 1/2 [£(h, s') 2 ] 1/2 < ||g|| ||h||£||s'|| 2 . 

The functional C, called the covariance functional of s, is also linear in its two 
arguments. Thus C[-,-| is a bounded bilinear functional on Ti x H. It follows 
(e.g. Rudin (1991, Theorem 12.8, p. 310)) that there exists a unique bounded 
linear operator V : H — > 7d, called the covariance operator of s, such that 

C[g,h] = (g, Vh) 

for each g, h G Tt. The covariance operator V is self-adjoint , i.e., (Vg, h) = 
(g, Vh) for all g, h G TL, since the covariance functional is symmetric, C[h, g] = 
C[ g,h] for all g, h G TL. The covariance operator is also positive semidefinite, 
i.e., (h^h) > 0 for all h G TL, since 

(h, Vh) = C[h, h] = £(h, s') 2 > 0 


for all h G TL. 

Now consider the second-order scalar random variable ||s'||. By Parseval’s 
relation, 

OO 

ll s 'MII 2 = 

i—1 

for all uj G £2, and therefore 


OO 

ills'll 2 = 

i= 1 

because {(hj, s') 2 }?^ is a sequence of nonnegative random variables. Further- 
more, 

£(hi,s') 2 = (hjj'Phj) (27) 

for * = 1, 2, . . ., by definition of the covariance operator V , and therefore 

OO 

ills'll 2 =^(h i ,Ph i ). 

i=l 

The summation on the right-hand side, called the trace of V and written trV . is 
independent of the choice of orthonormal basis {M£i for TL, for any positive 
semidefinite bounded linear operator from TL into 7d (e.g. Reed and Simon (1972, 
Theorem VI. 18, p. 206)). Thus 

OO 

tvV = y^(h, : , Ph,) = fils'll 2 < oo, 

i = 1 

and Eq. (26) can be written as 

f||s|| 2 = ||s|| 2 +trP, (28) 
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which is a generalization of Eq. (42) to second-order 7-f-valued random variables. 

Since trP < oo, P is a trace class operator, and therefore also a compact 
operator (e.g. Reed and Simon (1972, Theorem VI. 21, p. 209)). Since P is 
self-adjoint in addition to being compact, it follows from the Hilbert-Schmidt 
theorem (e.g. Reed and Simon (1972, Theorem VI. 16, p. 203}) that there exists 
an orthonormal basis for TL which consists of eigenvectors {hj}?^ of P, 

Vh i = Ajhi 

for i = 1,2,..., where the corresponding eigenvalues A, = (hi,Phj) for i = 
1,2,... are all real numbers and satisfy Ai — > 0 as i — > oo. In fact, the eigenvalues 
are all nonnegative since P is positive semidefinite, and therefore A * = ||Phj|| 
for *=1,2,.... Further, it follows from Eq. (27) that 

Xi = (h M Phi) = £ (hi, s') 2 = of, 

where of is the variance of the scalar random variable (hi, s), for i = 1,2,.... 
By the definition of trP, 

OO OO 

£||s'|| 2 = tr V = y^(hj,Phj) = ^ Ai < oo. 

i= 1 i= 1 

Thus the eigenvalues {Aj}^! of V are the variances {of and have finite sum 
trP. Equation (28) can then be rewritten as 

OO 

£|! s |! 2 = || s || 2 + 5> 2 , (29) 

i= 1 

which is another generalization of Eq. (42). 

Since every h € H has the representation h = X^i(hj> h)hi, and since 
Vh £ H for every h £ H, taking hi = hi and using the fact that 

(hi, Vh) = (Phi, h) = Xi (hj , h) = a 2 (hi, h) 


for i = 1,2,..., gives the following representation for P: 

OO 

Ph = ^a 2 (h ! ,h)h i (30) 

i = 1 

for every h G Ti . Thus the expectation £{ g, s')(h, s') is given by the convergent 
series 

oo 

£(g,s')(h,s') = C[g, h] = (g, Ph) = y]cr 2 (h i ,g)(h ! ,h), 

i— 1 

for every g, h £ TL . 

Finally, since P is a positive semidefinite bounded linear operator from TL 
into 7Y, there exists a unique positive semidefinite bounded linear operator P 1 / 2 : 
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Tt — > 74, called the square root of V, that satisfies ( V 1 ^ 2 ) = V (e.g. Reed and 
Simon (1972, Theorem VI. 9, p. 196)). Since V is also self-adjoint and trace class, 
V 1 / 2 is self-adjoint and Hilbert- Schmidt (e.g. Reed and Simon (1972, p. 210)), 
with the same eigenvectors as V and with eigenvalues that are the nonnegative 
square roots of the corresponding eigenvalues of V. That is, 

V 1/2 hi = CT,:h 8 , 

where cq = )yj 2 = [8 (h.;, s') 2 ] 1 / 2 , for i = 1,2, Therefore (Ji = (h^P 1 / 2 ^) = 

||7 : ’ 1 / 2 hij| for * = 1 , 2 ,..., and V l ^ 2 has the representation 

OO 

P 1/2 h = y^<7j(hj,h)hj (31) 


for every h € Tt. 

A. 4 Construction of second-order 74-valued random vari- 
ables 

It will now be shown how essentially all second-order 74-valued random vari- 
ables can be constructed. This will be accomplished by first reconsidering, in 
a suggestive notation, the defining properties of every second-order 74-valued 
random variable. The construction given here is by Ito’s regularization theorem 
(Ito (1984, Theorem 2.3.3, p. 27), Kallianpur and Xiong (1995, Theorem 3.1.2, 
p. 87)) applied to 74, and amounts to formalizing on 74 the usual construction 
of infinite-dimensional random variables through random Fourier series. 

For the moment, fix a second-order 74-valued random variable s, and consider 
the behavior of 

s[h] = (h, s) 

as a functional of h g 74, that is, as h varies throughout 74. The functional s[-] 
has three important properties. First, on evaluation at any h g 74, it is a scalar 
random variable, with 

s[h](w) = (h,s(w)) 

for each to g ft, since s : O — > 74 is an 74-valued random variable. Thus s[-] is a 
map from 74 into the set of scalar random variables on (f 1,!F,P). Second, this 
map is linear, 

s[ag + /3h] = as[g] + /3s [h] 

for all g, h g 74 and a, (3 g R, by linearity of the inner product. Third, according 
to Eq. (25), 

(£s 2 [h]) 1/2 < 7 ||h||, (32) 

where 

7 = (£| M | 2 ) 1/2 < 

since the 74-valued random variable s is second-order. Thus s[-] is a linear map 
from 74 into the set of second-order scalar random variables on (f l,iF,P). 
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Now recall the space P 2 (f2, LF, P), whose elements are the equivalence classes 
of second-order scalar random variables, where two scalar random variables are 
called equivalent if they are equal wpl. The space L 2 (f l,LF,P) is a Hilbert 
space, with the inner product of any two elements f,s £ L 2 (tt,P,P) given 
by £fs and the corresponding norm of any element s £ L 2 (fl,P, P) given by 
(5s 2 ) 1 / 2 . Inequality (32) states that the functional s[-] is bounded, when viewed 
as a map from TL into L 2 (f2, LF, P). 

A map s[-] from TL into the set of scalar random variables on (D, T , P), which 
is linear in the sense that if g, h £ TL and a, (3 £ R then 


s[ag + /?h] = as[g] + /3s[h] wpl, 


is called a random linear functional (e.g. Ito (1984, p. 22), Omatu and Seinfeld 
(1989, p. 48)). Observe that the set of u> £ fi of probability measure zero where 
linearity fails to hold can depend on a, (3, g and h. If linearity holds for all 
to £ 12, for all g, h £ TL and a,/3sl, then the random linear functional is called 
perfect. If s[-] is a random linear functional and there is a constant 7 £ R such 
that Eq. (32) holds for all h £ TL, then the random linear functional is called 
second-order. Thus, given any particular 7Y-valued random variable s, the map 
s[-] defined for all h € TL by s[h] = (h, s) is a perfect random linear functional, 
and if s is second-order then so is s[-]. 

Now it will be shown that a random linear functional s[-] is second-order if, 
and only if, 

OO 

^5s 2 [h,]<oo. (33) 

i=l 

In particular, a collection of scalar random variables with 5 s 2 < 00 

can be used to define a second-order random linear functional, by setting s[h,] = 
Si for i = 1,2,.... It will then be shown how to construct, from any given second- 
order random linear functional s[-]> a second-order 7P valued random variable s 
such that, for all h £ TL, 

(h, s) = s[h] wpl. 

Such an 7-f-valued random variable s is called a regularized version of the random 
linear functional s[-] (Ito (1984, Definition 2.3.2, p. 23)). 

Let s[.] be a second-order random linear functional. Given any h £ TL and 
positive integer n, it follows from the linearity of s[-] that 


s 


^(hfh)h, : 

_i= 1 


5 ^( h i,h)s[hi] wpl, 
i— 1 


where the set of probability measure zero on which equality does not hold may 
depend on h and on the orthonormal basis elements {h. ( }™ =1 . By the bounded- 
ness of s[-] it follows that 



<r 


^(h, : ,hjh, 


i=l 


28 



for some constant 7 6 R which is independent of h and n. Taking the limit as 
n — > 00 gives 


£ X( hi ' h M h *] < 7 2 |ih || 2 < 00 , 


V*=l 


for all h G 77. Thus the series h)s[hj] converges in P 2 (n,.P, P), i.e. , 

there exists a unique element s [h] G L 2 {VL,T, P) such that 


lim £ 





= 0 , 


for all h G 77. Equivalently, since a series converges in a Hilbert space if, and 
only if, it converges in norm, 


00 X / 2 

X {£ {(h», h)s[hj]} 2 ^ = X |( h i, h)| (£s 2 [ hi ]) 1/2 < 00 , 


for all h € 77. By the Riesz representation theorem applied to the Hilbert space 
of square-summable sequences of real numbers, and since 


X(h 8 ,h ) 2 = ||h || 2 


by Parseval’s relation, the series h)s[hj] therefore converges in L 2 (£l, P, P), 

for all h G 77, if, and only if, Eq. (33) holds, in which case 


X l(hi, h)| {£s 2 [h.i\) 1/2 < ||h|| 

7= 1 


00 


1 1/2 


X^ 2 N 


< 00 , 


by the Schwarz inequality. Thus, if s[-] is a second-order random linear func- 
tional, then Eq. (33) holds, for every orthonormal basis {h ,}^ 1 of 77. 

Conversely, suppose that Eq. (33) holds for a random linear functional s[-], 
for some orthonormal basis of 77. Since every h G 77 has the repre- 

sentation h = JXiS; 1 (hi, h)hj, it follows from the linearity of s[-] that if h G 77 
then 

OO 

s[h] = y^(hj,h)s[hj] wpl, 

7=1 


and therefore 

00 

s 2 [h] < :||h|| 2 ^s 2 [hi] wpl, 
7=1 


by the Schwarz inequality and Parseval’s relation. Thus 


£s 2 l h] < IlhfX^N. 

7=1 
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for every h e H, i.e., Eq. (32) holds with 

OO 

7 2 = ^fs 2 [hi] < oo, 

i—1 

by Eq. (33), and therefore s[-] is a second-order random linear functional. Fur- 
thermore, since s[-] is a second-order random linear functional, Eq. (33) holds 
for every orthonormal basis {h i }^ 1 of TL. 

Now let a[-] be a given second-order random linear functional. Since 

OO OO 

£ 22 s 2 [ h i] = y^£s 2 [hj] < oo, 

i= 1 i= 1 

the sum YITL i s 2 [h^] must be finite wpl, i.e., if 

OO 

E = {w G : 22 s 2 [hi](w) < oo} 

i= 1 

then E £ T and P(E) = 1, where the set E may depend on {hj}?^. Define 
s(w) for each w e O by 

of EEi his[hi](w) ifweE 
0 if u>£E ■ 

By Parseval’s relation it follows that 

EEi s2 [ h i]M if W G E 

0 if to <£ E ’ 

and therefore ||s(w)|| 2 < oo for all u> £ f2. Thus s is a map from O into 7d, and 
for any h € H, 

OO 

(h, s(w)) = ^(h,, h ) s [ h i]( w ) ( 34 ) 

i= 1 

for each u G E. Now, ifheH then 

oo 

s[h] = ^(h»,h)s[hj] wpl, 
i= 1 

and so there is a set Eh £ T with P{Eh ) = 1, that may depend on {h,}^ 1 as 
well as on h, such that 

OO 

s[h](w) = 5^(h i ,h)s[h i ](w) 

i= 1 

for each uj G Eh- Therefore, for all h GW, 

(h,s(w)) = s[h](w) 
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for each to € E fl E h, and P(E fl Eh) = 1. Since the probability space (f 2, T , P) 
was assumed to be complete, and since s[h] is a scalar random variable for 
each h G Ti, it follows that (h, s) is a scalar random variable for each h £ Ti. 
Therefore the map s : Q, — > TL is an 7Y-valued random variable. Since 

OO OO 

£||s|| 2 =£^s 2 [hi] = y^gs 2 [hj] < oo, 

i—1 i= 1 

s is a second-order 7Y-valued random variable. Since s[-] is bounded as a map 
from Ti into L 2 (fl,P,P), 



for all h £ Ti, and since Eq. (34) holds for all h € Ti and to € E, it follows that 

£ (s[h] — (h, s)) 2 = 0 

for all h £ Ti. Therefore, for all h <G Ti, (h, s) = s[h] wpl. 

B The Hilbert spaces 

Let Ti be a real, separable Hilbert space, with inner product and corresponding 
norm denoted by (•,•) and || • ||, respectively. Denote by B{Ti) the Borel field 
generated by the open sets in Ti. For convenience it will be assumed in this 
Appendix that Ti is infinite-dimensional. 

Appendix B.l uses a self-adjoint linear operator on Ti to construct a special 
family of Hilbert spaces {*!>£, p > 0}. The inner product and corresponding 
norm on 4> p are denoted by (•, -) p and || • || p , respectively, for each p > 0. These 
Hilbert spaces have the following properties: (i) $ 0 = Ti; (ii) for each p > 0, 

C Ti, and therefore is real and separable; (iii) for each p > 0, 4> p is 
dense in Ti, and therefore is infinite-dimensional; and (iv) if 0 < q < r, then 
||h|| = 1 1 h 1 1 o < |j h|j q < |jh|| r for all h e <h r , and therefore Ti = 4> 0 D D <& r - 
In view of property (iv), the family {4 > p ,p > 0} is called a decreasing family of 
Hilbert spaces. The construction given here follows closely that of Kallianpur 
and Xiong (1995, Example 1.3.2, pp. 40-42). For various concrete examples and 
classical applications of decreasing families of Hilbert spaces constructed in this 
way, see Reed and Simon (1972, pp. 141-145), Ito (1984, pp. 1 -12), Kallianpur 
and Xiong (1995, pp. 29-40), and Lax (2006, pp. 61-67). 

Appendix B.2 discusses the spaces <f> p in case Ti = L 2 (S), the space of 
square-integrable vector or scalar fields on the sphere S, when the operator L 
used in the construction of the spaces 4> p is taken to be L = — A, where A is 
the Laplacian operator on the sphere. 
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B.l Construction of the Hilbert spaces <3> p 

Let L be a densely defined, positive semidefinite, self-adjoint linear operator 
on 7i, and let I denote the identity operator on Ti. It follows from elementary 
arguments (e.g. Riesz and Sz.-Nagy (1955, p. 324)) that the inverse operator 
(I + LC 1 is a bounded, positive semidefinite, self-adjoint linear operator defined 
on all of 7d, in fact with 

ll(I + L)- 1 h|| < ||h|| 

for all h € H. Assume that some power p\ > 0 of (I+L) -1 is a compact operator 
on H. Then it follows from the Hilbert-Schmidt theorem (e.g. Reed and Simon 
(1972, Theorem VI. 16, p. 203)) that there exists a countable orthonormal basis 
for H which consists of eigenvectors {g;}?^ of (I + L) _Pl , 

(I + L) -Pl g; = Higi 

for i = 1,2, . . ., where the corresponding eigenvalues {/-^}fci satisfy 1 > /n > 
/ 12 > •••, with fii — > 0 as i — » oo. Moreover, pi > 0 for i = 1,2,..., for 
suppose otherwise. Then there is a first zero eigenvalue, call it Pm+u since the 
eigenvalues decrease monotonically toward zero. Therefore (I + L) -Pl has finite 
rank M, hence I + L is defined everywhere in TL and also has rank M. But 
rank (I + L) > rank I = oo since L is positive semidefinite and 7d was assumed 
infinite-dimensional, a contradiction. 

Now define {Aj}?^ by (1 + A ;) _Pl = n i . Then 0 < Ai < A 2 < • • • , with 
\i — > oo as * — » oo, and A j < oo for * = 1, 2 , . . . since /ij > 0 for i = 1, 2 , . . .. 
Since the function A (/x) = — 1 is measurable and finite for /i G (0,1], 

it follows from the functional calculus for self-adjoint operators (e.g. Riesz and 
Sz.-Nagy (1955, pp. 343-346), Reed and Simon (1972, pp. 259-264)) that 

Lg i = Kgi 

for * = 1, 2, . . ., and similarly for all p > 0 that 

(I + L) p gi = (1 + A;) P gi 

for i = 1,2,..., with (I + L) p densely defined and self-adjoint in H for all p > 0. 
For each p > 0, denote by & p the domain of definition of (I + L) p , i.e., 

4> p = {he7d: ||(I + L) p h|| < oo}. 

In particular, $ 0 = 'H- Now 

OO OO 

||(I + L) p h|| 2 = ^((I + L) p h,g i ) 2 =^(h,(I + L) p g l ) 2 

i= 1 i= 1 

oo oo 

= + X ‘) P Si) 2 = + A *) 2p ( h ^ gi) 2 

i= 1 i—1 
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for each p > 0, where the first equality is Parseval’s relation and the second one 
is due to the fact that (I + L) p is self-adjoint. Thus for each p > 0, <b p is given 
explicitly by 

{ OO 

h S H : ^(1 + Aj) 2p (h, g i) 2 < oo 

i=i 

Using this formula, it can be checked that for each p > 0, <I> p is an inner 
product space, with inner product (-,-) p defined by 

OO 

(g, h) p = 5^(1 + A t ) 2p (g, gj )(h, gi ) = ((I + L) p g, (I + L) p h) 

i= 1 

for all g, h G <3> p , and corresponding norm || • || p defined by 
l|h||p = (h,h) p = ||(I + L) p h|| 2 

for all h e 3> p . It follows also that if 0 < q < r, then ||h|| = ||h|| 0 < ||h|j g < ||h|| r 
for all he$ r , and therefore that <3> r C C 7i. 

Each inner product space <f> p , p > 0, is in fact a Hilbert space, i.e. , is already 
complete in the norm || • || p . To see this, suppose that {M~ =1 is a Cauchy 
sequence in $ p for some fixed p > 0, i.e. that ||h„ — h m || P — > 0 as n,m — > oo. 
Since ||h„ — h m || < ||h„ — h m || p for all n,m > 1, it follows that is also 

a Cauchy sequence in 7d, and since 7d is complete, the sequence converges to a 
unique element hoo £ Tt. It remains to show that in fact h„ — > hoc £ $ p as 
n — » oo. 

Now 

OO 

||h n - h m || 2 = || (I + L) p (h„ - h m )|| 2 - ^(1 + Ai) 2p (h„ - h m , gi ) 2 . 

i= 1 

Thus, that is a Cauchy sequence in <I> p means that, given any e > 0, 

there exists an M = M(e) such that, for all n, m > M, 

i 

53(1. + Ai) 2p (h„ — h m ,gj) 2 < e 

i= 1 

for any I > 1. But for each i = 1,2,..., 

| (l^m hoo • S'/ ) | ^ II h m h-o 1 1 1 1 Si 1 1 II hm hoo 1 1 * 0 &S 171 > 00 , 

hence (h TO ,gi) — »■ (hoo,g») as m — * oo, and therefore 

i 

53(1 + Ai) 2p (h„ - hoo, gi) 2 < e 

for all n > M and I > 1. Letting I — > oo then gives 

OO 

||h n - hoo|| 2 = 53(i + Ai) 2p (h„ - hoo, gi) 2 < e 

i= 1 
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for all n> M, and therefore h„ — > G <f> p as n — > oo. 

Thus, for each p > 0, is a Hilbert space, with inner product (-,-) p and 

corresponding norm || • || p . It can be checked that {(1 + Xi)~ p gi}'^ 1 is an 
orthonormal basis for $ p , for each p > 0. 2 

B.2 The case H = L 2 (S ) with L = -A 

Now let H = L 2 (S), the Hilbert space of real, Lebesgue square- integrable scalars 
on the unit 2-sphere S, with inner product 

(</>,V0 = [ <^(x)V’(x)dx 

Js 

for all (j), i\> G T 2 (<S'), where x = (aq,^) denotes spherical coordinates on S 
and dx denotes the surface area element, and with corresponding norm ||0|| = 
(0, 0) 1 / 2 for all (j> G L 2 (S). Let L = —A, where A is the Laplacian operator 
on L 2 (S). Thus L is a densely defined, positive semidefinite, self-adjoint linear 
operator on L 2 (S). Denote by I the identity operator on L 2 (S). 

It will be shown first that for all p\ > 1/2, (/ — A) -Pl is a Hilbert-Schmidt 
operator on L 2 (S), hence a compact operator on L 2 (S). By Appendix B.l, this 
allows construction of the decreasing family of Hilbert spaces {$ p = $ p (S),p > 
0}, 

$ P = {0GL 2 (S):||(/ — AH>||<oc}, 

with inner product 

(</>) VOp = ((I — A) p </>, (/- A)A/0 

for all (j>, %/) G <k p , and corresponding norm \\4>\\ p = (</>, f° r a U e ^p- Thus 
if (j> G and p is a positive integer or half-integer, then all partial (directional) 
derivatives of 4> U P to order 2 p are Lebesgue square-integrable. 

Second, a Sobolev-type lemma for the sphere will be established, showing 
that if <j> G $ 1 / 2 +q with q > 0, then 0 is a bounded function on S, with bound 

max|0(x)| 2 < ^(1 + ^)\\Hl/ 2 + q - (35) 

It follows that if <j> G $i+ 9 with q > 0, then the first partial derivatives of <fi 
are bounded functions on the sphere, and in particular that $i+ 9 C C°(S), 
the space of continuous functions on the sphere. It will be shown that, in fact, 
if <j> G $i+<j with q > 0, then (j) is Lipschitz continuous on S. Thus, for any 

2 It follows that, for any sequence \r ri with 0 < ro < ri < r 2 < • • ■ — > oo, $ = 

^u= 0 ^r n is a separable Frechet space, and since the norms || • || rn are Hilbertian seminorms 
on also a countably Hilbertian space. If (I + L) — Pl is not just compact but in fact Hilbert- 
Schmidt, and if, for instance, p n = npi, then <f> = n^LQ^> Pn is a countably Hilbertian nuclear 
space, and it is possible to define $'-■ valued random variables, where is the dual space of <I>. 

Such random variables are useful for stochastic differential equations in infinite-dimensional 
spaces (see the books of Ito (1984) and Kallianpur and Xiong (1995)), but are not immediately 
important for the principle of energetic consistency developed in this chapter. 
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q > 0 and any nonnegative integer l, <^ l+ i/ 2+q C C l (S), the space of functions 
with l continuous partial derivatives on the sphere, and in fact all of the partial 
derivatives up to order l of a function <j> G <J>i + ;/ 2 + g are Lipscliitz continuous. 

These results carry over to vectors in the usual way. Thus denoting by L 2 (S ) 
also the Hilbert space of real, Lebesgue square-integrable n-vectors on S, the 
inner product is 


(g,h) 


g T (x)h(x) dx 


n 

'y 'j (dii hi) 
i=l 


for all g,h £ L 2 (S), and the corresponding norm is ||h|| = (h, h) 1 / 2 for all 
h G L 2 (S). Thus for n-vectors on S, the Hilbert spaces <J> p , p > 0, are defined 
by 

% = {h £L 2 {S) : ||(7 — A) p h|| < oo}, 

with inner product 


(g, h) p = ((/ - A) p g, (I - A) p h) = Z2(g i} hi) p 

i— 1 


1 /2 

for all g,he and corresponding norm ||h|| p = (h, h) p ' for all h e 

To establish that ( I—A)~ p is a Hilbert-Schmidt operator on L 2 (S) if p > 1/2, 
note first that 


E 


1=0 


21 + 1 

[l + ^Z + l)]^ 2 * 


<1+ 2^ 


(36) 


if e > 0. To obtain this inequality, let 


f(x) 


2x T 1 

[1 + x(x + l)] 1+2e 


for x > 0 and e > 0. Then f is monotone decreasing for x > 1/2, and /( 0) > 
/( 1), and so 


00 2/ -I- 1 00 r°° 1 

S [1 + / + i)]i + ., =m+Y.m< m + / m *- = 1 + 57- 

The sum in Eq. (36) diverges logarithmically for e = 0. 

Now let C = (I — A)~ p with p > 0. Thus C is a bounded operator from 
L 2 (S) into L 2 (S ), with \\C(j)\\ < ||</|| for all </ G L 2 (S). The real and imaginary 
parts of the spherical harmonics Y) m form an orthonormal basis for L 2 (S ), and 

A Y™ = -1(1 + i)y, m 


for l > 0 and \m\ < l. Thus 


cy l 


yrn-ym 
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with eigenvalues Xf 1 = (E) 7 ", CY/") = [1 + 1 (l + 1)] p for l > 0 and \m\ < l. But 


E E ( A H 2 E [1 + /(J + 1)]2p’ 

1 = 0 m=—l 2= 0 L v n 

and so this sum is finite for p > 1/2 by Eq. (36). Hence C is Hilbert-Sclnnidt 
for p > 1/2. 

To establish the bound of Eq. (35), suppose that <j> € d> 1 / 2 + 9 with q > 0. 
Thus ( I — A ) 1 ^ 2+q (j) e L 2 (S) and has a spherical harmonic expansion 

OO l 

( /- A) 1/2+ v = E E 

l — 0 m=—l 


where the convergence is in L 2 (S ), with 

OO l 

ll^ll?/2+ 9 = IK / - A ) 1/2+ ^ll 2 = E E IA m l 2 <oo. 

Therefore 


1—0 m——l 


oo l 


^=EE [i+i{i+i)]~ l/ 2 ~ q pr Y r, 


(37) 


(38) 


/— 0 m=—l 


where the convergence is in $!/ 2 + q - It will be shown that this series converges 
absolutely, hence pointwise, so that 

OO l 

m = E E i 1 + + 1 )}- l/2 - q pr Y r(x) 

l — 0 m=—l 

for each x G S'. This will also give Eq. (35). 

Now, 

OO l 

h < Ei 1 +^ +i r 1/2-9 ^ iA m i \ Y ri 


2—0 


1= — 2 


and so 


1/2 


1/2 




E K" 


z=o 


m——l 


by the Schwarz inequality. The spherical harmonic addition theorem says that 

Pi(cos7) = 2iTT E yr^WTiy) 


m=—l 
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for l > 0, where Pi is the I th Legendre polynomial and 7 is the angle between x 
and y. This implies that 


1 

E l y ™MI 2 

m=—l 


21 + 1 
47 r 


for all x € S , and so 


101 <^f 2 m+i] 1 / 2 [i+i(i+i))- 1/2 - q \ E iA m i 2 ) 
/=0 U=_; J 

Another application of the Schwarz inequality then gives 


1/2 


1/2 


1/2 


101 < ^<E[ 2/+i n i +^ + 1 )r 1 " 29 EE i/3 

V 47r t i=0 J U=Om=-[ 

or, using Eq. (37), 


m 1 2 


1 J , 07 _ i_ 1 

101“ < 4^110.1/2+9 E [1 + Z(* + 1)]1+2<J- 

Therefore, by Eq. (36), the sum in Eq. (38) converges absolutely, and Eq. (35) 
holds. 

Now suppose that <f> € 4>i +(? with q > 0. To establish that <f> is Lipsclritz 
continuous on S, note first that by the previous result, 

OO l 

0W = E E [i+Ki+vr'-wYrw 

l — 0 m=—l 


for each x £ S, where 

OO l 

ll0ll?+ g = ll( / -A) 1+9 0l| 2 = E E IA m | 2 <oo. (39) 

/— 0 m=—l 


Therefore, 

OO l 

i0( x ) - 0(y)i <E E i 1+l ( l + i )r 1 ~wi i^ m w - ir(y)i 

l — 0 m=—l 


for each x, y £ S, and so by the Schwarz inequality, 


1/2 


1/2 


I0(x)-0(y)| < Et 1 + i (i+ i r 1 "M E IA m l 2 E l^ m (x)-^ m (y)| 2 


1=0 


< m=—l 
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By the spherical harmonic addition theorem, 


E l y r( x )- y i m (y)l 2 = E [l y i m ( x )l 2 - 2Re ^ m (x)^r (y) + l*T(y)l 2 

m=—l 
21 + 1 


1= — l 


2n 


[1 - Pi (cos 7)] , 


where 7 = 7(x, y) is the angle between x and y. Therefore, 

00 / O/ r 1 \ V 2 if 1 'I ' 

[1- Pi (cos 7 )] 1/2 i e » 

Z=0 ' ' lm=-Z J 


and so by Eq. (39) and the Schwarz inequality, 


1/2 


I0(x) - 0(y)| < ^ i El 1 + + + i)]" 2 " 29 ^ + !)[!- -f)( cos 7)] Il0||i +a . 


. Z=0 


Now, Pi(l) = 1, P/(l) = Z(Z + l)/2, and P/'(l) = [Z(Z + 1) - 2]P/(l)/4 > 0 for 
l > 0. It follows that for 7 sufficiently small, 


1 — P/(cos7) < (1 — cos7)P/(l) = 1(1 + 1) sin 2 


and so 


where 


l<M x ) - </>(y)| < 


JLi 

V2+ 


1 1+9 


sin 


7+y) 


(40) 


K 2 = EC 1 + + + 1)]“ 2 " 29 (2Z + 1)Z(Z + 1). 


1—0 


This series converges for q > 0 since the terms decay like l 1 4q , and Eq. (40) 
shows that (f> is Lipschitz continuous. 


C Some basic concepts and definitions 

This appendix summarizes background material used elsewhere in this article. 
For further treatment see, for instance, Doob (1953), Royden (1968), and Reed 
and Simon (1972). 

C.l Measure spaces 

Let X be a set. A collection C of subsets of X is called a a- algebra, or Borel 
field, if (i) the empty set 0 is in C, (ii) for every set A € C, the complement 
A = {x £ X : x £ A} of A is in C, and (iii) for every countable collection 
{A,}~i of sets Ai € C, the union U-^ 1 A i of the sets is in C. Given any collection 
A of subsets of X, there is a smallest u-algebra which contains A, i.e., there is 
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a cr-algebra C such that (i) A C C, and (ii) if B is a cr-algebra and A C B then 
C C B. The smallest cr-algebra containing a given collection A of subsets of A 
is called the Borel field of X generated by A. A measurable space is a couple 
(X,C) consisting of a set X and a cr-algebra C of subsets of X. If (X,C) is a 
measurable space and Y € C, then ( Y,Cy ) is a measurable space, where 

C Y = {A £ C : A c Y}, 

i.e. , Cy consists of all the sets in C that are subsets of Y . 

The set R e of extended real numbers is the union of the set R of real numbers 
and the sets { 00 } and {— 00 }. Multiplication of any two extended real numbers 
is defined as usual, with the convention that 0-oo = 0. Addition and subtraction 
of any two extended real numbers is also defined, except that 00—00 is undefined, 
as usual. 

Let Y and Z be two sets. A function g is called a map from Y into Z , written 
g : Y — > Z, if g(y) is defined for all y £ Y and g(y) £ Z for all y £ Y. Thus a 
map g : R — » R is a real- valued function defined on all of the real line, a map 
g : Y — > R is a real-valued function defined on all of Y, and a map g : Y — * R e 
is an extended real- valued function defined on all of Y . 

Let ( X,C ) be a measurable space. A subset A of A is called measurable if 
A £ C. A map g : X — * R e is called measurable (with respect to C) if 

{1 £ I : g(x) < a} £ C, 

for every a £ R. If g : X — > R e is measurable then \g\ is measurable, and if 
h : X — » R e is another measurable map then gh is measurable. A measure g on 
(X,C) is a map g : C — > R e that satisfies (i) u(A) > 0 for every measurable set 
A, (ii) g(0) = 0, and (iii) 

( OO \ OO 

U Ei ) 

i—1 ) i=l 

for every countable collection {E i }ffL l of disjoint measurable sets, i.e., for every 
countable collection of sets Ei £ C with = 0. A measure space (X,C,g) 

is a measurable space (X,C) together with a measure g on (X,C). 

Let ( X,C,g ) be a measure space. A condition C(x) defined for &\\ x £ X is 
said to hold almost everywhere (a.e.) (with respect to g) if the set E = {x £ 
X : C(x) is false} on which it fails to hold is a measurable set of measure zero, 
i.e., E £ C and g(E) = 0. In particular, two maps g : X — > R e and h : X — » R e 
are said to be equal almost everywhere, written g = h a.e., if the subset of X 
on which they are not equal is a measurable set of measure zero. 

A measure space (X,C,g) is called complete if C contains all subsets of 
measurable sets of measure zero, i.e., if B £ C , g(B) = 0, and A C B together 
imply that A £ C. If ( X,C,g ) is a complete measure space and A is a subset 
of a measurable set of measure zero, then g(A) = 0. If ( X,C,g ) is a measure 
space then there is a complete measure space (A, Co, go), called the completion 
of (A ,C,g), which is determined uniquely by the conditions that (i) C C Co, (ii) 
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if D £ C then p(D) = /j,q(D), and (iii) D £ Co if and only if D = A U B where 
B £ C and A C C £ C with /j(C) = 0. Thus a measure space can always be 
completed by enlarging its cr-algebra to include the subsets of measurable sets 
of measure zero and extending its measure so that the domain of definition of 
the extended measure includes the enlarged cr-algebra. 

An open interval on the real number line R is a set (a, (3) = {x € R : a < x < 
(3) with a, /3 £ R e and a < (3. Denote by I3(R) the Borel field of R generated 
by the open intervals, and denote by Z(R) C B( R) the sets that are countable 
unions of disjoint open intervals. For each set I = U £ T(R), define 

OO 

m *( J ) = _ ai )> 

i= 1 


and for each set B £ Z?(R) define 

m*(B) = inf to*(J), 

where the infimum (greatest lower bound) is taken over all those I £ Z(R) such 
that B C I. Then m* is a measure on the measurable space (R,£>(R)). The 
completion of the measure space (R, £?(R),to*) is denoted by (R, Ai,m). The 
sets in Ai are called the Lebesgue measurable sets on R, and m is called Lebesgue 
measure on R. 

Let (X,C, i-i) be a complete measure space, and let g : X — > R e and h : X — > 
R e be two maps. If g is measurable and g = h a.e., then h is measurable. 


C.2 Integration 

In this subsection let (X,C,p.) be a measure space. The characteristic function 
Xa of a subset A of X is the map xa '■ X — > {0, 1} defined for each x £ X by 


. , f 1 if x £ A 
XA(X) = { 0 ifx?A ■ 

A characteristic function xa is a measurable map if, and only if, A is a mea- 
surable set. A map <f> : X — > R e is called simple if it is measurable and takes 
on only a finite number of values. Thus the characteristic function of a mea- 
surable set is simple, and if 4> is simple and takes on the values ai, ... ,a n then 
4> = a iXEi, where £) = {x £ X : 4>{x) = o^} £ C for i = 1, . . . , n. If 0 is 
simple and the values a±, . . . , a n it takes on are all nonnegative, the integral of 
4> over a measurable set E with respect to measure p is defined as 


(j>dg = T. a-ip,{Ei n E), 


i= 1 


where Ei = {x £ X : <j>{x) = a0 for i = 1 , . . . , n. It is possible that f E (j>dii = 
oo, for instance if a i ^ 0 and D E) = oo, or if a\ = oo and n(Ei HE) ^ 0. 
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Let E be a measurable set and let g : X — > ffi 6 be a map which is nonnegative, 
i.e. , g(x) > 0 for all x € X. If <7 is measurable, the integral of g over E with 
respect to g is defined as 


g dg = sup 



where the supremum (least upper bound) is taken over all simple maps <f> with 
0 < (j) < g. Function g is called integrable (over E, with respect to g) if g is 
measurable and 

/ g dg < 00. 

J E 

If is a collection of nonnegative measurable maps from X into R e , then 

h = hi is a nonnegative measurable map from X into K 6 and 


hdg = 


OO 

E 

i—1 1 


hi dg, 


and in particular, h is integrable if and only if f E hi dg < 00. 

Let E be a measurable set and let g : X — » R e be a map. The positive part 
g + of g is the nonnegative map g + = g V 0, i.e., g + (x) = max{g(:r), 0} for each 
x £ X, and the negative part g~ is the nonnegative map g~ = (—g) V 0. Thus 
g = g + — g~ and \g\ = g + + g~ . If g is measurable, so are g + and g~ , as well 
as \g\. Function g is called integrable (over E, with respect to g) if both g + and 
g~ are integrable, in which case the integral of g is defined as 


IE 


gdg 



IE 


9 dg. 


Thus g is integrable over E if, and only if, \g\ is integrable over E, in which case 



< 


IE 


\g\ dg < 00. 


If g is integrable over X, then \g\ < 00 a.e., g is integrable over E, and 


\g\ dg< \g\ dg < 00. 


ix 


If g is measurable, then 

[ \g\ dg = 0 

Jx 

if, and only if, <7 = 0 a.e. 

Let E be a measurable set and let g : X — » R e and h : X — » R e be two maps. 
If g 2 and h 2 are integrable over E then gh is integrable over E, and 


gh dg 


< / \gh\ dg < ( / g 2 dg 


1/2 


h 2 dg 


'E 


1/2 


< OO. 


(41) 


41 



If g and h are integrable over E and g = h a.e., then 


gdg 



If the measure space is complete, and if g is integrable over E and g = h a.e., 
then h is integrable over E and 



hdg. 


Now consider the complete measure space (R, At, to), where A4 is the a- 
algebra of Lebesgue measurable sets on R and to is Lebesgue measure on R. 
If g : R — > R e is measurable with respect to A4, and is either nonnegative or 
integrable over R with respect to to, the integral of g over a Lebesgue measurable 
set E is called the Lebesgue integral of g over E, and is often written as 


g dm = / g(x)dx. 

J E 


A Borel measure on R is a measure defined on the Lebesgue measurable sets 
A4 that is finite for bounded sets. If F is a monotone increasing function on R 
that is continuous on the right, i.e., if F{(3) > F(a) and linr^^ Q F{(3) = F{a ) 
for all «,/l £ R with a < (3, then there exists a unique Borel measure g on R 
such that 

A*((ck, /?]) = F((3) ~ F{a) 

for all a, {3 € R with a < (3, where (a,/3] = {x € M : a < x < (3}. Let F be 
a monotone increasing function that is continuous on the right, and let g be 
the corresponding Borel measure. If g : R — > R e is measurable with respect 
to A4, and is either nonnegative or integrable over R with respect to the Borel 
measure g, the Lebesgue-Stieltjes integral of g over a Lebesgue measurable set 
E is defined as 

/ g(x) dF{x) = / gdg. 

J E J E 


C.3 Probability 

A probability space is a measure space (fi, T, P) with P(Q) = 1. The set Id is 
called the sample space, the er-algebra T of measurable sets is called the event 
space, a measurable set is called an event, and P is called the probability measure. 
For the rest of this subsection, let (fl,P, P) be a probability space. 

A measurable map from f l into R e is called a (scalar) random variable. Thus 
a map s : Ll — > R e is a random variable if, and only if, 

{uj € fl : s(u>) < x} € T 

for every igR. In particular, if s is a random variable then the function 
F s (x) = P({u> € fl : s(u>) < x}), 
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called the probability distribution function of s, is defined for all x € M. The 
distribution function of a random variable is monotone increasing and contin- 
uous on the right. If the distribution function F s of a random variable s is an 
indefinite integral, i.e., if 


Fs(x) = f f s (y)dy 
J — oo 

for all x € R and some Lebesgue integrable function f s , then f s is called the 
probabilty density function of s, and dF s /dx = f s a.e. (with respect to Lebesgue 
measure) in R. 

The expectation operator £ is the integration operator over f l with respect 
to probability measure. Thus if s is a random variable then £|s| is defined, since 
\s\ is a random variable and |s| > 0, and 

£\s\ = / |s| dP < oo, 

Jn 

while a random variable s is integrable over tt if, and only if, £|s| < oo, in which 
case 

£s = j s dP 
Jn 

and |£s| < £|s| < oo. If s is a random variable with £|s| < oo, then s = £s 
is called the mean of s, and the mean can be evaluated equivalently as the 
Lebesgue-Stieltjes integral 


/ OO 

x dF s (x ) , 

-OO 

where F s is the distribution function of s, hence 

/ OO 

xf s (x) dx 

-OO 


if also s has a density function f s , where the integral is the Lebesgue integral. 

If s is a random variable then £s 2 is defined, since s 2 > 0 is a random 
variable, and either £s 2 = oo or £s 2 < oo. A random variable s is called 
second-order if £s 2 < oo. If r and s are random variables then £\rs\ is defined 
since rs is a random variable, and £\rs\ < oo. If r and s are second-order 
random variables, then 


£ |rs| < (fr 2 ) 1 ^ 2 (£s 2 ) 1/2 < oo 

by Eq. (41), hence £rs is defined and \£rs\ < £ |rs| < oo. In particular, on 
taking r = 1 and using the fact that 

£1= [ 1 dP = P(fl) = 1, 

Jn 
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it follows that if s is a second-order random variable then its mean s = £s is 
defined, with 

0 < |s| = |£s| < £ |s| < (Ss 2 ') 1 ^ < oo. 

The variance a 2 = £{s — s) 2 of a second-order random variable s is therefore 
also defined, and finite, with 

0 < a 2 = £ (s 2 — 2ss + s 2 ) = £s 2 — s 2 < oo, 

and 

£s 2 = s 2 + cr 2 . (42) 

A condition C(u>) defined for all u € f2 is said to hold with probability one 
(wpl), or almost surely (a.s.), if it holds a.e. with respect to probability measure. 
Thus if s is a random variable, then s = 0 wpl if, and only if, £|s| =0. If s is 
a random variable with £|s| < oo, i.e. , if the mean of s is defined, then |s| < oo 
wpl. If r and s are two random variables with £\r\ < oo and £|s| < oo, and if 
r = s wpl, then r and s have the same distribution function and, in particular, 
£r = £s. If the probability space is complete, and if s is a random variable, 
r : — > R e and r = s wpl, then r is a random variable and has the same 

distribution function as s, and if, in addition, £|s| < oo, then £\r\ < oo and 
£r = £s. 

C.4 Hilbert space 

A nonempty set V is called a linear space or vector space (over the reals) if 
ag + j3 h £ V for all g, h £ V and a, (3 £ R. A norm on a linear space V is a 
real- valued function || • | such that, for all g.hGb and a £ R, (i) ||h|| > 0, (ii) 
||h|| = 0 if, and only if, h = 0, (iii) ||ah|| = \a\ ||h||, and (iv) ||g+h|| < ||g|| + ||h||. 
An inner product on a linear space V is a real-valued function (•,•) such that, 
for all f,g,h€ V and a £ R, (i) (h, h) > 0, (ii) (h, h) = 0 if, and only if, h = 0, 
(iii) (g, ah) = a(g,h), (iv) (f,g + h) = (f,g) + (f,h), and (v) (g, h) = (h,g). A 
normed linear space is a linear space equipped with a norm, and an inner product 
space is a linear space equipped with an inner product. Every inner product 
space Id is a normed linear space, with norm || • || given by ||h|| = (h, h) - 1 / 2 for 
all h £ V, where (•, •) is the inner product on V. A normed linear space V is an 
inner product space if, and only if, its norm |j • || satisfies the parallelogram law 

l|g + h|| 2 + || g — h|| 2 = 2(||g|| 2 + ||h|| 2 ), 

for all g, h £ V. On every inner product space V, the inner product (•,•) is 
given by the polarization identity 

(s> h) — |(l|g + h|| 2 - ||g-h|| 2 ), 

for all g, h £ V, where || • || is the norm corresponding to the inner product, i.e., 
||h|| = (h, h) 1 / 2 for all h £ V. The Schwarz inequality 

l(g> h)| < ||g|| ||h|| < oo, 
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for all g, h € V, holds on every inner product space V, where (•, •) is the inner 
product on V and || • || is the corresponding norm. 

A subset O of a norrned linear space V is called open in V if for every g GO, 
there exists an e > 0 such that if h G V and ||g — h|| < e then h € O. A subset 
B of a norrned linear space V is called dense in V if for every h G V and e > 0, 
there exists an element g € B such that ||g — h|| < e. A norrned linear space is 
called separable if it has a dense subset that contains countably many elements. 

A sequence of elements hi,h 2 , ... in a norrned linear space V is called a 
Cauchy sequence if |jh m — h„|| — > 0 as m,n — » oo. A sequence of elements 
hi, I 12 , . . • in a norrned linear space V is said to converge in V if there exists an 
element h € V such that |jh — h n || — > 0 as n — > oo, in which case one writes 
h = linin^oo h„ . A norrned linear space V is called complete if every Cauchy 
sequence of elements in V converges in V . A complete norrned linear space is 
called a Banach space. A Banach space on which the norm is defined by an inner 
product is called a Hilbert space. That is, a Hilbert space is an inner product 
space which is complete in the norm defined by the inner product. 

Let hi be a Hilbert space, with inner product (•,•) and corresponding norm 
|| • || . A subset S of TL is called an orthogonal system if g ^ 0, h ^ 0 and 
(g, h) = 0, for every g, h £ hi. An orthogonal system S is called an orthogonal 
basis (or complete orthogonal system) if no other orthogonal system contains 
S' as a proper subset. An orthogonal basis S is called an orthonormal basis if 
|jhj| = 1 for every h £ S. There exists an orthonormal basis which has countably 
many elements if, and only if, hi is separable. If hi is a separable Hilbert space 
then every orthonormal basis for hi has the same number of elements N < oo, 
and N is called the dimension of hi. 

Let hi be a separable Hilbert space, with inner product (•,•)> corresponding 
norm || • ||, and orthonormal basis S = N < oo. If h £ hi then the 

sequence of partial sums h)hj converges to h, i.e., 

n 

lim ||h — V'(h i ,h)h i || = 0, 

n^-N Z ' 

i= 1 

and so every h E H has the representation 

N 

h = y^(hj, h)hj. 

i= 1 


Furthermore, 


N 

(g,h) = 5Z(hi,g)(hi,h), 

i= 1 


for every g, h E H. Therefore, for every h E TL, 


N 


l|h|| 2 = X><,h) 2 , 

i—1 


45 



which is called Parseval ’s relation. 

An example of a separable Hilbert space of dimension N < oo is the space £ 2 N 
of square-sunnnable sequences of N real numbers, with inner product (g, h) = 
X) i= i 9ihi, where gi and hi denote element i of g G £ 2 N and h G £ 2 N , respectively. 
An orthonormal basis for £ 2 N is the set of unit vectors where element 

i of ej is 1 if i = j and 0 if i ^ j. In case N < oo, the elements of £ 2 N are 
usually written as (column) A - - vectors g = {g\, . . . ,gjy) T , the inner product is 
then (g, h) = g J h, and the columns of the N x N identity matrix constitute an 
orthonormal basis. 

Let ( X,C,g ) be a measure space. Denote by £ 1 (AT,C,^) the set of inte- 
grate maps from X into R e , and consider the function || • || defined for all 
g G ^{X^C^g) by 

llsll = f \g\dh- 

Jx 

The set £ 1 (X, C,g) is a linear space, and the function || • || is by definition real- 
valued, i.e., ||<?|| < oo for all g G £ 1 (X,C,/z). The function || • || also satisfies 
all of the properties of a norm, except that ||.g|| = 0 does not imply g = 0. 
However, ||g|| = 0 does imply that g = 0 a.e., and g = 0 a.e. implies that 
||g|| = 0, for all g G £ 1 (X, C,g). Two maps g and h from X into R e are called 
equivalent , or are said to belong to the same equivalence class, if g — h a.e. If 
g and h are equivalent, and if g,h G C l {X,C, g), then ||g|| = ||/i||. That is, || • || 
assigns the same real number to each member of a given equivalence class of 
elements of £}(X, C,g), and thereby the domain of definition of the function 
|| • || is extended from the elements of £ 1 (X, C,/z) to the equivalence classes of 
elements of C 1 (X,C, g). The set L 1 {X, C, g) of equivalence classes of elements of 
£ 1 (X, C, g) is a linear space, and || - 1| is a norm on this space. The Riesz-Fischer 
theorem states that L x (X, C, g) is complete in this norm, i.e., that L 1 (X, C, g) is 
a Banach space under the norm || • ||. The elements of L 1 (X,C, g), unlike those 
of £ 1 (X,C,/z), are not defined pointwise in X, and therefore are not maps. 

Denote by C 2 (X,C,g) the set of square- integrable maps from X into R e , 
and consider the function || • || defined for all g G L 2 (X,C, g) by 

\\g\\=[J x g 2 d ^ /2 . 

Again, the function || • | assigns the same real number to each member of any 
given equivalence class of elements of C 2 (X, C, g), i.e., to each g, h G C 2 (X,C, g) 
such that g = h a.e., and in particular, ||g|| = 0 if and only if g = 0 a.e. Thus 
the domain of definition of the function || • || can be extended to the equivalence 
classes. The set L 2 {X,C,g) of equivalence classes of elements of C 2 {X,C,g) is 
a linear space, || ■ || is a norm on this space, and L 2 (X,C,g) is complete in this 
norm. Therefore L 2 (X,C,g) is a Banach space under the norm || • |. Moreover, 
this norm satisfies the parallelogram law, and therefore L 2 {X,C, g) is a Hilbert 
space. The polarization identity yields the inner product (•,•) on L 2 (X,C,g), 
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for all g,h € L 2 (X,C, g). Again, the elements of L 2 (X,C,g) are not defined 
pointwise and are not maps. The Schwarz inequality holds on L 2 (X,C, g) since 
L 2 (X,C, g ) is an inner product space, and gives Eq. (41) when restricted to the 
elements of C 2 (X,C, g). 

Let V\ and V 2 be two normed linear spaces, with inner products || • ||i and 
|| • ||2, respectively, and let H be a Hilbert space, with inner product (•,•). A 
bounded linear operator from Vi into V2 is a map T : V\ — > V2 such that (i) 
T (ag + /?h) = aT g + j3T h for all g, h <G V\ and a, (3 € R, and (ii) there exists 
a constant 7 € R such that 1 1 lZ~h 1 1 2 < 7||h||i for all h € Vi. A bounded linear 
operator T : H — > 7i is called self-adjoint if (T g, h) = (g, Th) for all g, h G H , 
and is called positive semidefinite if (h,Th) > 0 for all h € Tt. 

At the beginning of this subsection, the field of scalars for linear spaces V 
was taken to be the real numbers, and inner products were therefore defined to 
be real-valued. Thus the Hilbert spaces defined here are real Hilbert spaces. It 
is also possible, of course, to define complex Hilbert spaces. One property that 
is lost by restricting attention in this chapter to real Hilbert spaces is that, while 
every positive semidefinite operator on a complex Hilbert space is self-adjoint, 
a positive semidefinite operator on a real Hilbert space need not be self-adjoint 
(e.g. Reed and Simon (1972, p. 195)). Covariance operators on a real Hilbert 
space are necessarily self-adjoint as well as positive semidefinite, however, as 
discussed in Appendix A. 3. 
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